Abstract
Proteins often accumulate neutral mutations that do not affect current functions but can profoundly influence future mutational possibilities and functions. Understanding such hidden potential has major implications for protein design and evolutionary forecasting but has been limited by a lack of systematic efforts to identify potentiating mutations. Here, through the comprehensive analysis of a bacterial toxin–antitoxin system, we identified all possible single substitutions in the toxin that enable it to tolerate otherwise interface-disrupting mutations in its antitoxin. Strikingly, the majority of enabling mutations in the toxin do not contact and promote tolerance non-specifically to many different antitoxin mutations, despite covariation in homologues occurring primarily between specific pairs of contacting residues across the interface. In addition, the enabling mutations we identified expand future mutational paths that both maintain old toxin–antitoxin interactions and form new ones. These non-specific mutations are missed by widely used covariation and machine learning methods. Identifying such enabling mutations will be critical for ensuring continued binding of therapeutically relevant proteins, such as antibodies, aimed at evolving targets.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout






Similar content being viewed by others
References
Green, A. G. et al. Large-scale discovery of protein interactions at residue resolution using co-evolution calculated from genomic sequences. Nat. Commun. 12, 1396 (2021).
Cong, Q., Anishchenko, I., Ovchinnikov, S. & Baker, D. Protein interaction networks revealed by proteome coevolution. Science 365, 185–189 (2019).
Ovchinnikov, S., Kamisetty, H. & Baker, D. Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information. eLife 3, e02030 (2014).
Hopf, T. A. et al. Sequence co-evolution gives 3D contacts and structures of protein complexes. eLife 3, e03430 (2014).
Kondrashov, A. S., Sunyaev, S. & Kondrashov, F. A. Dobzhansky–Muller incompatibilities in protein evolution. Proc. Natl Acad. Sci. USA 99, 14878–14883 (2002).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Morcos, F. et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc. Natl Acad. Sci. USA 108, E1293–E1301 (2011).
Sulkowska, J. I., Morcos, F., Weigt, M., Hwa, T. & Onuchic, J. N. Genomics-aided structure prediction. Proc. Natl Acad. Sci. USA 109, 10340–10345 (2012).
Cheng, R. R., Morcos, F., Levine, H. & Onuchic, J. N. Toward rationally redesigning bacterial two-component signaling systems using coevolutionary information. Proc. Natl Acad. Sci. USA 111, E563–E571 (2014).
Aakre, C. D. et al. Evolving new protein–protein interaction specificity through promiscuous intermediates. Cell 163, 594–606 (2015).
Lite, T. L. V. et al. Uncovering the basis of protein–protein interaction specificity with a combinatorially complete library. eLife 9, e60924 (2020).
McClune, C. J., Alvarez-Buylla, A., Voigt, C. A. & Laub, M. T. Engineering orthogonal signalling pathways reveals the sparse occupancy of sequence space. Nature 574, 702–706 (2019).
McMahon, C. et al. Yeast surface display platform for rapid discovery of conformationally selective nanobodies. Nat. Struct. Mol. Biol. 25, 289–296 (2018).
Schoof, M. et al. An ultrapotent synthetic nanobody neutralizes SARS-CoV-2 by stabilizing inactive Spike. Science 370, 1473–1479 (2020).
Damen, L. A. A. et al. Construction and evaluation of an antibody phage display library targeting heparan sulfate. Glycoconj. J. 37, 445–455 (2020).
Zupancic, J. M. et al. Directed evolution of potent neutralizing nanobodies against SARS-CoV-2 using CDR-swapping mutagenesis. Cell Chem. Biol. 28, 1379–1388 (2021).
Aramli, L. A. & Teschke, C. M. Single amino acid substitutions globally suppress the folding defects of temperature-sensitive folding mutants of phage P22 coat protein. J. Biol. Chem. 274, 22217–22224 (1999).
Baroni, T. E. et al. A global suppressor motif for p53 cancer mutants. Proc. Natl Acad. Sci. USA 101, 4930–4935 (2004).
Berroteran, R. W. & Hampsey, M. Genetic analysis of yeast Iso-1-cytochrome c structural requirements: suppression of Gly6 replacements by an Asn52 → Ile replacement. Arch. Biochem. Biophys. 288, 261–269 (1991).
Bloom, J. D., Labthavikul, S. T., Otey, C. R. & Arnold, F. H. Protein stability promotes evolvability. Proc. Natl Acad. Sci. USA 103, 5869–5874 (2006).
Bloom, J. D. & Glassman, M. J. Inferring stabilizing mutations from protein phylogenies: application to influenza hemagglutinin. PLoS Comput. Biol. 5, e1000349 (2009).
Gong, L. I., Suchard, M. A. & Bloom, J. D. Stability-mediated epistasis constrains the evolution of an influenza protein. eLife 2, e00631 (2013).
Brown, N. G., Pennington, J. M., Huang, W., Ayvaz, T. & Palzkill, T. Multiple global suppressors of protein stability defects facilitate the evolution of extended-spectrum TEM β-lactamases. J. Mol. Biol. 404, 832–846 (2010).
Fane, B., Villafane, R., Mitraki, A. & King, J. Identification of global suppressors for temperature-sensitive folding mutations of the P22 tailspike protein. J. Biol. Chem. 266, 11640–11648 (1991).
Huang, W. & Palzkill, T. A natural polymorphism in β-lactamase is a global suppressor. Proc. Natl Acad. Sci. USA 94, 8801–8806 (1997).
Hudson, W. H. et al. Distal substitutions drive divergent DNA specificity among paralogous transcription factors through subdivision of conformational space. 113, 326–331 (2015).
Joyet, P., Declerck, N. & Gaillardin, C. Hyperthermostable variants of a highly thermostable alpha-amylase. Biotechnol. 10, 1579–1583 (1992).
Marciano, D. C. et al. Genetic and structural characterization of an L201P global suppressor substitution in TEM-1 β-lactamase. J. Mol. Biol. 384, 151–164 (2008).
McKeown, A. N. et al. Evolution of DNA specificity in a transcription factor family produced a new gene regulatory module. Cell 159, 58–68 (2014).
Poteete, A. R., Rennell, D., Bouvier, S. E. & Hardy, L. W. Alteration of T4 lysozyme structure by second-site reversion of deleterious mutations. Protein Sci. 6, 2418–2425 (1997).
Shortle, D. & Lin, B. Genetic analysis of staphylococcal nuclease: identification of three intragenic ‘global’ suppressors of nuclease-minus mutations. Genetics 110, 539–555 (1985).
Tsai, A. Y. M., Itoh, M., Streuli, M., Thai, T. & Saito, H. Isolation and characterization of temperature-sensitive and thermostable mutants of the human receptor-like protein tyrosine phosphatase LAR. J. Biol. Chem. 266, 10534–10543 (1991).
Yang, R. et al. Second-site suppressors of HIV-1 capsid mutations: restoration of intracellular activities without correction of intrinsic capsid stability defects. Retrovirology 9, 30 (2012).
Zheng, J., Guo, N. & Wagner, A. Selection enhances protein evolvability by increasing mutational robustness and foldability. Science 370, eabb5962 (2020).
Ortlund, E. A., Bridgham, J. T., Redinbo, M. R. & Thornton, J. W. Crystal structure of an ancient protein: evolution by conformational epistasis. Science 317, 1544–1548 (2007).
Starr, T. N., Picton, L. K. & Thornton, J. W. Alternative evolutionary histories in the sequence space of an ancient protein. Nature 549, 409–413 (2017).
Klein, F. et al. Somatic mutations of the immunoglobulin framework are generally required for broad and potent HIV-1 neutralization. Cell 153, 126–138 (2013).
Angelini, A. et al. Directed evolution of broadly crossreactive chemokine-blocking antibodies efficacious in arthritis. Nat. Commun. 9, 1461 (2018).
Madan, B. et al. Mutational fitness landscapes reveal genetic and structural improvement pathways for a vaccine-elicited HIV-1 broadly neutralizing antibody. Proc. Natl Acad. Sci. USA 118, e2011653118 (2021).
Ivankov, D. N., Finkelstein, A. V. & Kondrashov, F. A. A structural perspective of compensatory evolution. Curr. Opin. Struct. Biol. 26, 104–112 (2014).
Starr, T. N. & Thornton, J. W. Epistasis in protein evolution. Protein Sci. 25, 1204–1218 (2016).
Diss, G. & Lehner, B. The genetic landscape of a physical interaction. eLife 7, e32472 (2018).
Fowler, D. M. & Fields, S. Deep mutational scanning: a new style of protein science. Nat. Methods 11, 801–807 (2014).
Fowler, D. M. et al. High-resolution mapping of protein sequence–function relationships. Nat. Methods 7, 741–746 (2010).
McLaughlin, R. N., Poelwijk, F. J., Raman, A., Gosal, W. S. & Ranganathan, R. The spatial architecture of protein function and adaptation. Nature 491, 138–142 (2012).
Whitehead, T. A. et al. Optimization of affinity, specificity and function of designed influenza inhibitors using deep sequencing. Nat. Biotechnol. 30, 543–548 (2012).
Otwinowski, J., McCandlish, D. M. & Plotkin, J. B. Inferring the shape of global epistasis. Proc. Natl Acad. Sci. USA 115, E7550–E7558 (2018).
Poelwijk, F. J. Context-dependent mutation effects in proteins. Methods Mol. Biol. 1851, 123–134 (2019).
Schmiedel, J. M. & Lehner, B. Determining protein structures using deep mutagenesis. Nat. Genet. 51, 1177–1186 (2019).
Tareen, A., Posfai, A., Ireland, W. T., Mccandlish, D. M. & Kinney, J. B. MAVE-NN: learning genotype–phenotype maps from multiplex assays of variant effect. Preprint at bioRxiv https://doi.org/10.1101/2020.07.14.201475 (2020).
Atwal, G. S. & Kinney, J. B. Learning quantitative sequence–function relationships from massively parallel experiments. J. Stat. Phys. 162, 1203–1243 (2016).
Sarkisyan, K. S. et al. Local fitness landscape of the green fluorescent protein. Nature 533, 397–401 (2016).
Pokusaeva, V. O. et al. An experimental assay of the interactions of amino acids from orthologous sequences shaping a complex fitness landscape. PLoS Genet. 15, e1008079 (2019).
Rollins, N. J. et al. Inferring protein 3D structure from deep mutation scans. Nat. Genet. 51, 1170–1176 (2019).
Poelwijk, F. J., Socolich, M. & Ranganathan, R. Learning the pattern of epistasis linking genotype and phenotype in a protein. Nat. Commun. 10, 4213 (2019).
Hopf, T. A. et al. Mutation effects predicted from sequence co-variation. Nat. Biotechnol. 35, 128–135 (2017).
Riesselman, A. J., Ingraham, J. B. & Marks, D. S. Deep generative models of genetic variation capture the effects of mutations. Nat. Methods 15, 816–822 (2018).
Hecht, M. H. & Sauer, R. T. Phage lambda repressor revertants. Amino acid substitutions that restore activity to mutant proteins. J. Mol. Biol. 186, 53–63 (1985).
Ortlund, E. A., Bridgham, J. T., Redinbo, M. R. & Thornton, J. W. Crystal structure of an ancient protein: evolution by conformational epistasis. Science 317, 1544–1548 (2007).
Russ, W. P. et al. An evolution-based model for designing chorismate mutase enzymes. Science 369, 440–445 (2020).
Jiang, X.-L., Dimas, R. P., Chan, C. T. Y. & Morcos, F. Coevolutionary methods enable robust design of modular repressors by reestablishing intra-protein interactions. Nat. Commun. 12, 5592 (2021).
Mutalik, V. K. et al. Precise and reliable gene expression via standard transcription and translation initiation elements. Nat. Methods 10, 354–360 (2013).
Khlebnikov, A., Datsenko, K. A., Skaug, T., Wanner, B. L. & Keasling, J. D. Homogeneous expression of the PBAD promoter in Escherichia coli by constitutive expression of the low-affinity high-capacity araE transporter. Microbiology 147, 3241–3247 (2001).
Stiffler, M. A., Subramanian, S. K., Salinas, V. H. & Ranganathan, R. A protocol for functional assessment of whole-protein saturation mutagenesis libraries utilizing high-throughput sequencing. J. Vis. Exp. 113, e54119 (2016).
Warren, D. J. Preparation of highly efficient electrocompetent Escherichia coli using glycerol/mannitol density step centrifugation. Anal. Biochem. 413, 206–207 (2011).
Magoc, T. & Salzberg, S. L. FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics 27, 2957–2963 (2011).
Rognes, T., Flouri, T., Nichols, B., Quince, C. & Mahé, F. VSEARCH: a versatile open source tool for metagenomics. PeerJ 4, e2584 (2016).
Bloom, J. D. Software for the analysis and visualization of deep mutational scanning data. BMC Bioinforma. 16, 168 (2015).
Bank, C., Hietpas, R. T., Wong, A., Bolon, D. N. & Jensen, J. D. A Bayesian MCMC approach to assess the complete distribution of fitness effects of new mutations: uncovering the potential for adaptive walks in challenging environments. Genetics 196, 841–852 (2014).
Stan Development Team. Stan Modeling Language Users Guide and Reference Manual, v. 2.26. (Stan Development Team, 2021).
Riddell, A., Hartikainen, A. & Carter, M. PyStan v. 3.0.0 (2021).
Kingma, D. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2014).
Abadi, M. et al. Tensorflow: a system for large-scale machine learning. In Proc. 12th USENIX Symposium on Operating Systems Design and Implementation 265–283 (USENIX Association, 2016).
Hagberg, A. A., Schult, D. A. & Swart, P. J. Exploring network structure, dynamics, and function using NetworkX. In Proc. 7th Python in Science Conference (eds. Varoquaux, G. et al.) 11–15 (2008).
Tareen, A. & Kinney, J. B. Logomaker: beautiful sequence logos in Python. Bioinformatics 36, 2272–2274 (2020).
Acknowledgements
We thank members of the Laub and Marks laboratories, A. Batchelor, C. McClune, J. Ingraham, A. Schoech and I. Cvijovic for helpful discussions. We thank A. Murray, N. Gauthier, T. Okubo, S. Sinai and N. Youssef for feedback on the manuscript and M. Stiffler for sharing protocols before publication. This work was supported by the Howard Hughes Medical Institute (M.T.L.), National Institutes of Health grant no. R01CA260415 (D.S.M.), Chan Zuckerberg Initiative CZI2018-191853 (D.S.M.), Ashford PhD fellowship (D.D.), Boehringer Ingelheim Funds PhD fellowship (D.D.), Fanny and John Hertz Fellowship (E.N.W.), National Institutes of Health NLM training grant no. T15LM007092 (A.G.G.), National Institutes of Health grant no. T32GM007753 (T.-L.V.L.), Jane Coffin Childs Memorial Fund for Medical Research fellowship (B.W.) and National Institutes of Health grant no. K99GM135536 (B.W.).
Author information
Authors and Affiliations
Contributions
D.D., D.S.M. and M.T.L conceived the project and wrote the paper. D.D. designed and performed experiments, analysed data and built the quantitative models. A.G.G. performed covariation analysis for ~350 protein–protein interactions. B.W. helped with library transformations. T.-L.V.L. created the combinatorial antitoxin mutant library. E.N.W. suggested helpful tips on Bayesian modelling. D.S.M. and M.T.L. supervised the project.
Corresponding author
Ethics declarations
Competing interests
D.S.M. is an advisor for Dyno Therapeutics, Octant, Jura Bio, Tectonic Therapeutics and Genentech and a cofounder of Seismic. The remaining authors declare no competing interests.
Peer review
Peer review information
Nature Ecology & Evolution thanks Hsin-Hung Chou and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Orthogonal validation of growth rate inference, structural explanation for antitoxin mutation effects, and covariational signal between toxin–antitoxin ParE3/ParD3.
a, Comparison of growth rates inferred by high-throughput vs. individual growth measurement. X axis error bars indicate + /− 2x standard deviation derived from n = 10 or n = 11 technical plate reader replicates (see Methods). Y axis error bars indicate 95% posterior highest density interval. The Pearson correlation coefficient (r) is indicated. b, Raw log-read ratio reproducibility between replicates (+1 pseudocount) for all single and double mutants. The Pearson correlation coefficient (r) is indicated. c, Mean mutation effect of residues in the C-terminal α-helix 3 of the ParD3 antitoxin indicates that residues facing the toxin are more susceptible to mutations that disrupt the ParD3–ParE3 interaction, producing negative Δgrowth rate values. d, Mean mutation effect in the N-terminal oligomerization region of the antitoxin highlights residues susceptible to disrupting the ParD3–ParE3 interaction when mutated. Cartoon illustrates arrangement of ParE3–ParD3 octamer observed in the co-crystal structure (PDB: 5CEG). One of the 4 antitoxin monomers is coloured by the mean mutation effect. e, Top 10 toxin–antitoxin covarying residue pairs indicated for reference. f, The 90% precision cutoff yields 29 toxin–antitoxin covarying residue pairs (black in upper, right quadrant) of which 28 pairs fall within toxin–antitoxin interface residues that are < 6 Å minimum atom distance (ochre dots) in the ParE3-D3 crystal structure (PDB ID: 5CEG).
Extended Data Fig. 2 Titration of toxin and antitoxin expression levels, and sensitive identification of toxin substitutions which do not disrupt toxicity.
a, Cartoon illustration of the expression system. IPTG induces antitoxin, arabinose induces toxin. b, Growth rate of cells harbouring wild-type toxin ParE3 without antitoxin at different arabinose induction levels in arabinose-titratable E. coli strain BW27783. c, d, Growth rate of cells harbouring wild-type toxin–antitoxin ParE3/ParD3 under different antitoxin induction levels modulated with IPTG and 0.00012% arabinose induction (c) or 0.0008% arabinose induction (d). e, Distribution of ∆growth rates(T*-T) for all toxin single substitutions under different arabinose inducer concentrations, with positive ∆growth rate(T*-T) values indicating loss of toxin function. The set of ‘most toxic’ toxin substitutions (n = 310) is coloured in light blue, the set of ‘toxic’ substitutions (n = 781) is coloured in green (see Methods). Other classes of substitutions are indicated. The dynamic range (difference between 0 and the truncated toxin mutants) shrinks, as expected, for lower expression levels that do not fully inhibit growth with the wild-type toxin, and a higher fraction of mutants show loss of toxicity (higher ∆growth rates) under lower expression conditions. The toxin substitution A28Q is highlighted (dark blue) as an example that shows no growth rate difference relative to wild-type toxin at high expression conditions, but is not as toxic as wild-type toxin at lower expression conditions. f, Schematic illustrating loss of toxicity detection using growth rate measurements in different expression regimes. g, Mean ∆growth rates(T*-T) of residue positions mapped onto the ParE3 toxin structure. Values shown for 0.00012% [arabinose] inducer. h, The mean ∆growth rates(T*-T) of a residue are correlated with the relative solvent accessibility of the residue (Pearson r = −0.66). Values shown for 0.00012% [arabinose] inducer. i,j, Distribution of ∆growth rate(T*-T) for all toxin substitutions (black) or top 10 coevolving residue substitutions (purple) in the toxin in absence of antitoxin (g) or presence of antitoxin (h). Values shown for 0.00012% [arabinose] inducer, and antitoxin is induced with 10 µM IPTG. k, The ∆growth rate(T*-T) values of each substitution at any position along the toxin ParE3. Green highlights the top 10 covarying positions between toxin and antitoxin in natural homologues. Values shown for 0.00012% [arabinose] inducer.
Extended Data Fig. 3 Volcano plot visualizing significant and substantial beneficial toxin variants in different antitoxin backgrounds, and beneficial toxin variants in various antitoxin backgrounds under ‘high’ and ‘low’ antitoxin expression conditions.
a, For each deleterious antitoxin variant background, the mean posterior change in the number of doublings, ∆growth rate(T*/AT* - T/AT*), of the most toxic toxin mutants are plotted vs. their significance (-log10(p(∆growth rate<0))) of deviation from the AT* single mutation. This is based on 10,000 discrete samples of the posterior ∆growth rate(T*/AT* - T/AT*) values inferred from the hierarchical Bayesian inference model (see Methods). Vertical line: +0.5 ∆growth rate, horizontal line: p(∆growth rate>0) = 0.0001. Red indicates significant and substantial beneficial toxin substitution using this cutoff. Experiments performed under ‘high antitoxin’ expression conditions. b, The minimum atom distance from a given deleterious antitoxin residue to each beneficial toxin is plotted vs. ∆growth rate(T*/AT* - T/AT*). Experiments performed under ‘high antitoxin’ expression conditions. c, The minimum atom distance from a given deleterious antitoxin residue to each beneficial toxin is plotted vs. ∆growth rate(T*/AT* - T/AT*). Experiments performed under ‘low antitoxin’ expression conditions. d, Distance vs. ∆growth rate(T*/AT* - T/AT*) of beneficial toxin variants for all deleterious antitoxin variant backgrounds. Experiments performed under ‘low antitoxin’ expression conditions. Values for (b-d) shown for double mutants with ∆growth rate effect size > +0.5 and p(∆growth rate>0) < 0.0001.
Extended Data Fig. 4 A non-specific, nonlinear model can explain most of the observed single and double-mutant growth rates.
a, Schematic of nonlinear, non-specific model: double-mutant expected growth rates (brown) are based on the independent (non-specific) sum of underlying toxin and antitoxin mutant effects, passed through a sigmoid function (yellow). b,c, Residuals for nonlinear, non-specific model (b) or linear non-specific model of the same structure without a non-linearity (c) showing unbiased residuals for the nonlinear model, but a complete misfit of the linear model. Model built using ‘high antitoxin’ expression levels. Explained variance (R2) is indicated. Significant and substantially positively (dark green) or negatively (green) deviating mutations are shown in (b) (see Methods). d, Inferred independent toxin single-substitution effects among the set of most toxic toxin mutants demonstrating a tail of independently beneficial toxin variants. Experiment performed under ‘high antitoxin’ expression levels. e,f, Nonlinear independent model fit to growth rates measured under ‘high antitoxin’ (e) or ‘low antitoxin’ (f) expression conditions. The wild-type toxin -antitoxin pair is inferred to be differently close to the sigmoid ‘cliff’ between expression conditions. g, Cartoon illustrating different detection of single-mutant effects depending on expression conditions. h-j, Correlation of inferred single-mutant effects (h), observed single-mutant ∆growth rate(T*/AT* - T/AT) effects (i), and double-mutant deviations of observed from expected growth rates (j) from separate inference under ‘high antitoxin’ (x axis) or ‘low antitoxin’ (y axis) expression conditions.
Extended Data Fig. 5 Deviation of observed from expected double-mutant growth rates reveals toxin variants with specific or with only non-specific beneficial effects, and fraction of specific vs. non-specific toxin variants.
a, For each beneficial toxin mutation (indicated above each plot) combined with each antitoxin variant indicated on the x axis, the plot shows the growth rate relative to the wild-type toxin–antitoxin pair (mean posterior ∆growth rate(T*/AT* - T/AT)). Grey dots represent T*/AT*, error bars indicate 95% posterior highest density interval. The ∆growth rate for each antitoxin mutant combined with wild-type toxin (T/AT*) is shown (black dots) along with the ∆growth rate for T*/AT* expected under the non-specific, nonlinear model (green dots). b, Deviation of the observed (dots) from the expected double-mutant growth rates (orange line) highlights classification of specific and non-specific toxin variants. Beneficial toxin substitutions (rows, n = 32) ordered by their range of growth rate deviations across deleterious antitoxin variants as in panel b. c-g, Specific vs. non-specific enabling toxin variants under ‘high’ antitoxin expression for all enabling toxin variants grouped by deleterious antitoxin for the more stringent set of 310 ‘most toxic’ toxins (c) and less stringent set of 781 ‘toxic’ toxins (d). Orange and purple indicate mutant pairs involving non-specific and specific, respectively, rescuing mutations in the toxin. Enabling toxin variants under ‘low’ antitoxin expression at different absolute growth rate cutoffs relative to the wild-type toxin/antitoxin growth rate (e), or grouped by ‘most toxic’ (f) or ‘toxic’ (g) toxin variants. h, Inferred non-specific toxin variant effect vs. minimum atom distance to any antitoxin atom for 21 non-specifically rescuing toxin variants (orange). i, j, For specific and non-specific beneficial toxin mutants, the change in growth rate in a deleterious antitoxin mutant background, ∆growth rate (T*/AT* - T/AT*), is plotted vs. minimum atom distance to the deleterious antitoxin mutation it rescues (i) or any antitoxin atom (j) in the ‘low antitoxin’ expression condition.
Extended Data Fig. 6 Natural sequence statistics, EVcouplings or DeepSequence models are not predictive of beneficial toxin substitution effects.
a, Distribution of number of specific and non-specific beneficial toxin substitutions (purple) vs. all possible toxin variants (grey) observed in natural sequences. b, Frequency distribution of beneficial toxin and deleterious antitoxin mutant pairs in natural sequences, with 29/51 pairs never observed. c-e, Effect size of toxin variant rescue vs. frequency of variant pair in natural sequences (c), conditional frequency of toxin variant given natural sequences containing the particular deleterious antitoxin substitution (d), or enrichment of beneficial toxin variant in natural sequences containing the deleterious antitoxin substitution (e). f-g, EVcouplings model inferred site-wise toxin mutant preferences (hi) vs. toxin mutant effect inferred in suppressor scan with the Pearson correlation coefficient indicated (f), or EVcouplings pairwise T*/AT* variant preference (Jij) vs. effect size of beneficial toxin mutation effect in a deleterious antitoxin variant background (g). h, Scatterplot of observed beneficial toxin effect in deleterious antitoxin mutant backgrounds (AT*), vs EVmutation (top row) or DeepSequence (variational auto-encoder) mutation effect predictions (bottom row). Pearson correlation (r) is indicated. i, Distribution of natural sequence identity fractions across the alignment. Different histograms illustrate fraction mutated for homologues containing the full concatenated toxin and antitoxin (grey), the toxin homologues only (blue), or the antitoxin homologues only (turqouise).
Extended Data Fig. 7 Non-specific suppressor toxin ParE3 variants are as or almost as toxic as wild-type ParE3, and reproducibility of antitoxin combinatorial variant log-read ratios.
a, Growth rates of ParE3 non-specific suppressor toxin variants (blue) compared to wild-type toxin ParE3 without antitoxin (black) and wild-type toxin and antitoxin (grey) under fully inhibitory toxin expression conditions (0.00012% [arabinose]) or half-maximal inhibitory expression conditions (0.00006% [arabinose]). Dark lines represent the mean OD600, shaded regions show standard deviation of the replicates (n = 10 or n = 11). b, Raw log-read ratio reproducibility between biological replicates (+1 pseudocount) for the combinatorial antitoxin library (8000 amino acid variants) in different toxin mutant backgrounds. Specific classes of antitoxin mutants, and Pearson correlation coefficients (r) are indicated.
Extended Data Fig. 8 Bayesian hierarchical model.
a, Simplified description of the Bayesian hierarchical model. Pre- and post-selection reads for each codon are drawn from a Poisson distribution. The log-ratios of these Poisson parameters are not fixed between synonymous codons but are instead drawn from a normal distribution, whose mean forms the amino acid mutant growth rate of interest. This model allows for different synonymous codons to inform each other as well as the amino acid mutant growth rate without being completely fixed. b, Full plate diagram description of the hierarchical Bayesian model capturing both replicates. Replicate index i takes values 1 or 2, amino acid index m takes on values ranging from 1-2040 (20*102) for the toxin or 1-1840 (92 * 20) for the antitoxin, codon index n takes on values ranging from 1-6426 (63*102) for the toxin or 1-5796 (63*92) for the antitoxin. Circles indicate random variables, grey circles represent observed random variables. c, Description of variables, likelihood function and priors used. The likelihood function incorporates maximum entropy distributions for the observed variables, and the priors incorporate computationally tractable, vague priors for the amino acid substitution growth rates. The relative priors on the standard deviation of replicate σ_repn vs. synonymous variant σ_synm reflect our prior belief that replicate experiment noise is larger than synonymous mutant noise. σ_bi and r_scale have improper priors.
Extended Data Fig. 9 Validation of Bayesian growth rate inference on synthetic datasets.
a, Three different true synthetic growth rate distributions used for simulating pre- and post-selection codon variant read count data. Synthetic growth rate distributions were chosen from observed toxin single-mutant growth rate distributions in 3 different antitoxin backgrounds, spanning the range of distributions observed. b,c, Inferred growth rates using the Bayesian hierarchical model (b) show less bias and incorporate uncertainty estimates compared to mean log-read ratio summary of pre-and post-selection read counts (+1 pseudocount) (c). Error bars in panel b reflect the 95% highest density posterior intervals, with the measure of centre being the mean posterior growth rate. d, Model uncertainties accurately reflect deviations of inferred true growth rates. Percentage of true synthetic amino acid growth rates falling into a certain highest density interval among all 2040 simulated toxin amino acid variants.
Extended Data Fig. 10 Posterior predictive checks show that the Bayesian hierarchical model can capture observed data statistics for both replicate experiments, whereas a non hierarchical model cannot.
a,b, A non-hierarchical model, in which all synonymous codon variants have the same growth rate (a), cannot explain the observed data. (b) The observed standard deviation of log-read ratios for synonymous wild-type toxin codon variants (red) (n = 278) fall outside of the non-hierarchical model’s expectations (grey). c, The synonymous amino acid mutant standard deviations within a replicate (y axis) are higher than codon mutant standard deviations between replicates (x axis). Light green indicates binned average. d, Bayesian hierarchical model allows for growth rate variation between synonymous codon mutants by drawing these from a Gaussian distribution. e-g, Observed data statistics fall within the hierarchical Bayesian model’s expected values. (e) The observed standard deviation of synonymous wild-type toxin codon mutant log-read ratios (red) fall within the model simulated values (stdev(log(c_post1k/c_pre1k) or stdev(log(c_post2k/c_pre2k) for biological replicate 1 or 2 respectively), see model code). Compare to panel (b) for the non-hierarchical model. (f) For each codon mutant, the hierarchical Bayesian model allows for simulating pre- and post-selection read counts (log(c_posti,n/c_prei,n), see ED Fig. 9), including log-read ratios, using the posterior parameter distribution. For each codon mutant, we calculate the p-value statistic (ie. the fraction of simulated samples falling below the observed log-read ratio). (g) Distribution of posterior simulated p-values for various statistics, demonstrating that no observed data statistic is biased to fall outside of the posterior simulated statistics.
Supplementary information
Supplementary Tables
Supplementary Tables: 1, Spatial distances of rescuing toxin substitutions to the antitoxin; 2, Strains created in this study; 3, Primers used in this study.
Supplementary Data
Location of beneficial toxin substitutions on the crystal structure.
Rights and permissions
About this article
Cite this article
Ding, D., Green, A.G., Wang, B. et al. Co-evolution of interacting proteins through non-contacting and non-specific mutations. Nat Ecol Evol 6, 590–603 (2022). https://doi.org/10.1038/s41559-022-01688-0
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s41559-022-01688-0
This article is cited by
-
Cryptic genetic variation shapes the fate of gene duplicates in a protein interaction network
Nature Communications (2025)
-
Widespread epistasis shapes RNA polymerase II active site function and evolution
Nature Communications (2025)
-
EvoWeaver: large-scale prediction of gene functional associations from coevolutionary signals
Nature Communications (2025)
-
Understanding the physiological role and cross-interaction network of VapBC35 toxin-antitoxin system from Mycobacterium tuberculosis
Communications Biology (2025)
-
Protein design using structure-based residue preferences
Nature Communications (2024)


