Co-evolution of interacting proteins through non-contacting and non-specific mutations

Ding, David; Green, Anna G.; Wang, Boyuan; Lite, Thuy-Lan Vo; Weinstein, Eli N.; Marks, Debora S.; Laub, Michael T.

doi:10.1038/s41559-022-01688-0

Article
Published: 31 March 2022

Co-evolution of interacting proteins through non-contacting and non-specific mutations

David Ding^1,2,
Anna G. Green^2,3,
Boyuan Wang⁴,
Thuy-Lan Vo Lite ORCID: orcid.org/0000-0003-2743-4231⁵,
Eli N. Weinstein⁶,
Debora S. Marks² &
…
Michael T. Laub ORCID: orcid.org/0000-0002-8288-7607^1,7

Nature Ecology & Evolution volume 6, pages 590–603 (2022)Cite this article

5994 Accesses
47 Citations
20 Altmetric
Metrics details

Subjects

Abstract

Proteins often accumulate neutral mutations that do not affect current functions but can profoundly influence future mutational possibilities and functions. Understanding such hidden potential has major implications for protein design and evolutionary forecasting but has been limited by a lack of systematic efforts to identify potentiating mutations. Here, through the comprehensive analysis of a bacterial toxin–antitoxin system, we identified all possible single substitutions in the toxin that enable it to tolerate otherwise interface-disrupting mutations in its antitoxin. Strikingly, the majority of enabling mutations in the toxin do not contact and promote tolerance non-specifically to many different antitoxin mutations, despite covariation in homologues occurring primarily between specific pairs of contacting residues across the interface. In addition, the enabling mutations we identified expand future mutational paths that both maintain old toxin–antitoxin interactions and form new ones. These non-specific mutations are missed by widely used covariation and machine learning methods. Identifying such enabling mutations will be critical for ensuring continued binding of therapeutically relevant proteins, such as antibodies, aimed at evolving targets.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Comprehensive identification of neutral and enabling mutations for the toxin–antitoxin system ParE3–ParD3.**

**Fig. 2: Deep mutational scanning reveals mutational tolerance and interface-disrupting substitutions in ParE3–ParD3.**

**Fig. 3: Beneficial, interaction-restoring mutations can be far from the deleterious mutation they rescue.**

**Fig. 4: Non-specific enabling mutations outnumber specific mutations and can be far from the deleterious mutation as well as the interface.**

**Fig. 5: Natural sequences and models trained on these provide insufficient information to predict enabling mutations.**

**Fig. 6: Non-specifically enabling mutations expand mutational paths to maintain old and evolve new interactions.**

Simultaneous enhancement of multiple functional properties using evolution-informed protein design

Article Open access 20 June 2024

NMR-guided directed evolution

Article 05 October 2022

Mapping the energetic and allosteric landscapes of protein binding domains

Article 06 April 2022

References

Green, A. G. et al. Large-scale discovery of protein interactions at residue resolution using co-evolution calculated from genomic sequences. Nat. Commun. 12, 1396 (2021).
Article CAS PubMed PubMed Central Google Scholar
Cong, Q., Anishchenko, I., Ovchinnikov, S. & Baker, D. Protein interaction networks revealed by proteome coevolution. Science 365, 185–189 (2019).
Ovchinnikov, S., Kamisetty, H. & Baker, D. Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information. eLife 3, e02030 (2014).
Hopf, T. A. et al. Sequence co-evolution gives 3D contacts and structures of protein complexes. eLife 3, e03430 (2014).
Article PubMed Central Google Scholar
Kondrashov, A. S., Sunyaev, S. & Kondrashov, F. A. Dobzhansky–Muller incompatibilities in protein evolution. Proc. Natl Acad. Sci. USA 99, 14878–14883 (2002).
Article CAS PubMed PubMed Central Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article CAS PubMed PubMed Central Google Scholar
Morcos, F. et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc. Natl Acad. Sci. USA 108, E1293–E1301 (2011).
Article CAS PubMed PubMed Central Google Scholar
Sulkowska, J. I., Morcos, F., Weigt, M., Hwa, T. & Onuchic, J. N. Genomics-aided structure prediction. Proc. Natl Acad. Sci. USA 109, 10340–10345 (2012).
Article CAS PubMed PubMed Central Google Scholar
Cheng, R. R., Morcos, F., Levine, H. & Onuchic, J. N. Toward rationally redesigning bacterial two-component signaling systems using coevolutionary information. Proc. Natl Acad. Sci. USA 111, E563–E571 (2014).
CAS PubMed PubMed Central Google Scholar
Aakre, C. D. et al. Evolving new protein–protein interaction specificity through promiscuous intermediates. Cell 163, 594–606 (2015).
Article CAS PubMed PubMed Central Google Scholar
Lite, T. L. V. et al. Uncovering the basis of protein–protein interaction specificity with a combinatorially complete library. eLife 9, e60924 (2020).
McClune, C. J., Alvarez-Buylla, A., Voigt, C. A. & Laub, M. T. Engineering orthogonal signalling pathways reveals the sparse occupancy of sequence space. Nature 574, 702–706 (2019).
Article CAS PubMed PubMed Central Google Scholar
McMahon, C. et al. Yeast surface display platform for rapid discovery of conformationally selective nanobodies. Nat. Struct. Mol. Biol. 25, 289–296 (2018).
Article CAS PubMed PubMed Central Google Scholar
Schoof, M. et al. An ultrapotent synthetic nanobody neutralizes SARS-CoV-2 by stabilizing inactive Spike. Science 370, 1473–1479 (2020).
Article CAS PubMed PubMed Central Google Scholar
Damen, L. A. A. et al. Construction and evaluation of an antibody phage display library targeting heparan sulfate. Glycoconj. J. 37, 445–455 (2020).
Article CAS PubMed PubMed Central Google Scholar
Zupancic, J. M. et al. Directed evolution of potent neutralizing nanobodies against SARS-CoV-2 using CDR-swapping mutagenesis. Cell Chem. Biol. 28, 1379–1388 (2021).
Article CAS PubMed PubMed Central Google Scholar
Aramli, L. A. & Teschke, C. M. Single amino acid substitutions globally suppress the folding defects of temperature-sensitive folding mutants of phage P22 coat protein. J. Biol. Chem. 274, 22217–22224 (1999).
Article CAS PubMed Google Scholar
Baroni, T. E. et al. A global suppressor motif for p53 cancer mutants. Proc. Natl Acad. Sci. USA 101, 4930–4935 (2004).
Article CAS PubMed PubMed Central Google Scholar
Berroteran, R. W. & Hampsey, M. Genetic analysis of yeast Iso-1-cytochrome c structural requirements: suppression of Gly6 replacements by an Asn52 → Ile replacement. Arch. Biochem. Biophys. 288, 261–269 (1991).
Article CAS PubMed Google Scholar
Bloom, J. D., Labthavikul, S. T., Otey, C. R. & Arnold, F. H. Protein stability promotes evolvability. Proc. Natl Acad. Sci. USA 103, 5869–5874 (2006).
Article CAS PubMed PubMed Central Google Scholar
Bloom, J. D. & Glassman, M. J. Inferring stabilizing mutations from protein phylogenies: application to influenza hemagglutinin. PLoS Comput. Biol. 5, e1000349 (2009).
Gong, L. I., Suchard, M. A. & Bloom, J. D. Stability-mediated epistasis constrains the evolution of an influenza protein. eLife 2, e00631 (2013).
Article PubMed PubMed Central CAS Google Scholar
Brown, N. G., Pennington, J. M., Huang, W., Ayvaz, T. & Palzkill, T. Multiple global suppressors of protein stability defects facilitate the evolution of extended-spectrum TEM β-lactamases. J. Mol. Biol. 404, 832–846 (2010).
Article CAS PubMed PubMed Central Google Scholar
Fane, B., Villafane, R., Mitraki, A. & King, J. Identification of global suppressors for temperature-sensitive folding mutations of the P22 tailspike protein. J. Biol. Chem. 266, 11640–11648 (1991).
Article CAS PubMed Google Scholar
Huang, W. & Palzkill, T. A natural polymorphism in β-lactamase is a global suppressor. Proc. Natl Acad. Sci. USA 94, 8801–8806 (1997).
Article CAS PubMed PubMed Central Google Scholar
Hudson, W. H. et al. Distal substitutions drive divergent DNA specificity among paralogous transcription factors through subdivision of conformational space. 113, 326–331 (2015).
Joyet, P., Declerck, N. & Gaillardin, C. Hyperthermostable variants of a highly thermostable alpha-amylase. Biotechnol. 10, 1579–1583 (1992).
CAS Google Scholar
Marciano, D. C. et al. Genetic and structural characterization of an L201P global suppressor substitution in TEM-1 β-lactamase. J. Mol. Biol. 384, 151–164 (2008).
Article CAS PubMed PubMed Central Google Scholar
McKeown, A. N. et al. Evolution of DNA specificity in a transcription factor family produced a new gene regulatory module. Cell 159, 58–68 (2014).
Article CAS PubMed PubMed Central Google Scholar
Poteete, A. R., Rennell, D., Bouvier, S. E. & Hardy, L. W. Alteration of T4 lysozyme structure by second-site reversion of deleterious mutations. Protein Sci. 6, 2418–2425 (1997).
Article CAS PubMed PubMed Central Google Scholar
Shortle, D. & Lin, B. Genetic analysis of staphylococcal nuclease: identification of three intragenic ‘global’ suppressors of nuclease-minus mutations. Genetics 110, 539–555 (1985).
Article CAS PubMed PubMed Central Google Scholar
Tsai, A. Y. M., Itoh, M., Streuli, M., Thai, T. & Saito, H. Isolation and characterization of temperature-sensitive and thermostable mutants of the human receptor-like protein tyrosine phosphatase LAR. J. Biol. Chem. 266, 10534–10543 (1991).
Article CAS PubMed Google Scholar
Yang, R. et al. Second-site suppressors of HIV-1 capsid mutations: restoration of intracellular activities without correction of intrinsic capsid stability defects. Retrovirology 9, 30 (2012).
Zheng, J., Guo, N. & Wagner, A. Selection enhances protein evolvability by increasing mutational robustness and foldability. Science 370, eabb5962 (2020).
Ortlund, E. A., Bridgham, J. T., Redinbo, M. R. & Thornton, J. W. Crystal structure of an ancient protein: evolution by conformational epistasis. Science 317, 1544–1548 (2007).
Starr, T. N., Picton, L. K. & Thornton, J. W. Alternative evolutionary histories in the sequence space of an ancient protein. Nature 549, 409–413 (2017).
Article CAS PubMed PubMed Central Google Scholar
Klein, F. et al. Somatic mutations of the immunoglobulin framework are generally required for broad and potent HIV-1 neutralization. Cell 153, 126–138 (2013).
Article CAS PubMed PubMed Central Google Scholar
Angelini, A. et al. Directed evolution of broadly crossreactive chemokine-blocking antibodies efficacious in arthritis. Nat. Commun. 9, 1461 (2018).
Article PubMed PubMed Central CAS Google Scholar
Madan, B. et al. Mutational fitness landscapes reveal genetic and structural improvement pathways for a vaccine-elicited HIV-1 broadly neutralizing antibody. Proc. Natl Acad. Sci. USA 118, e2011653118 (2021).
Article CAS PubMed PubMed Central Google Scholar
Ivankov, D. N., Finkelstein, A. V. & Kondrashov, F. A. A structural perspective of compensatory evolution. Curr. Opin. Struct. Biol. 26, 104–112 (2014).
Article CAS PubMed PubMed Central Google Scholar
Starr, T. N. & Thornton, J. W. Epistasis in protein evolution. Protein Sci. 25, 1204–1218 (2016).
Article CAS PubMed PubMed Central Google Scholar
Diss, G. & Lehner, B. The genetic landscape of a physical interaction. eLife 7, e32472 (2018).
Fowler, D. M. & Fields, S. Deep mutational scanning: a new style of protein science. Nat. Methods 11, 801–807 (2014).
Article CAS PubMed PubMed Central Google Scholar
Fowler, D. M. et al. High-resolution mapping of protein sequence–function relationships. Nat. Methods 7, 741–746 (2010).
Article CAS PubMed PubMed Central Google Scholar
McLaughlin, R. N., Poelwijk, F. J., Raman, A., Gosal, W. S. & Ranganathan, R. The spatial architecture of protein function and adaptation. Nature 491, 138–142 (2012).
Article CAS PubMed PubMed Central Google Scholar
Whitehead, T. A. et al. Optimization of affinity, specificity and function of designed influenza inhibitors using deep sequencing. Nat. Biotechnol. 30, 543–548 (2012).
Article CAS PubMed PubMed Central Google Scholar
Otwinowski, J., McCandlish, D. M. & Plotkin, J. B. Inferring the shape of global epistasis. Proc. Natl Acad. Sci. USA 115, E7550–E7558 (2018).
Article CAS PubMed PubMed Central Google Scholar
Poelwijk, F. J. Context-dependent mutation effects in proteins. Methods Mol. Biol. 1851, 123–134 (2019).
Article CAS PubMed Google Scholar
Schmiedel, J. M. & Lehner, B. Determining protein structures using deep mutagenesis. Nat. Genet. 51, 1177–1186 (2019).
Article CAS PubMed PubMed Central Google Scholar
Tareen, A., Posfai, A., Ireland, W. T., Mccandlish, D. M. & Kinney, J. B. MAVE-NN: learning genotype–phenotype maps from multiplex assays of variant effect. Preprint at bioRxiv https://doi.org/10.1101/2020.07.14.201475 (2020).
Atwal, G. S. & Kinney, J. B. Learning quantitative sequence–function relationships from massively parallel experiments. J. Stat. Phys. 162, 1203–1243 (2016).
Article Google Scholar
Sarkisyan, K. S. et al. Local fitness landscape of the green fluorescent protein. Nature 533, 397–401 (2016).
Article CAS PubMed PubMed Central Google Scholar
Pokusaeva, V. O. et al. An experimental assay of the interactions of amino acids from orthologous sequences shaping a complex fitness landscape. PLoS Genet. 15, e1008079 (2019).
Rollins, N. J. et al. Inferring protein 3D structure from deep mutation scans. Nat. Genet. 51, 1170–1176 (2019).
Article CAS PubMed PubMed Central Google Scholar
Poelwijk, F. J., Socolich, M. & Ranganathan, R. Learning the pattern of epistasis linking genotype and phenotype in a protein. Nat. Commun. 10, 4213 (2019).
Hopf, T. A. et al. Mutation effects predicted from sequence co-variation. Nat. Biotechnol. 35, 128–135 (2017).
Article CAS PubMed PubMed Central Google Scholar
Riesselman, A. J., Ingraham, J. B. & Marks, D. S. Deep generative models of genetic variation capture the effects of mutations. Nat. Methods 15, 816–822 (2018).
Article CAS PubMed PubMed Central Google Scholar
Hecht, M. H. & Sauer, R. T. Phage lambda repressor revertants. Amino acid substitutions that restore activity to mutant proteins. J. Mol. Biol. 186, 53–63 (1985).
Article CAS PubMed Google Scholar
Ortlund, E. A., Bridgham, J. T., Redinbo, M. R. & Thornton, J. W. Crystal structure of an ancient protein: evolution by conformational epistasis. Science 317, 1544–1548 (2007).
Article CAS PubMed PubMed Central Google Scholar
Russ, W. P. et al. An evolution-based model for designing chorismate mutase enzymes. Science 369, 440–445 (2020).
Article CAS PubMed Google Scholar
Jiang, X.-L., Dimas, R. P., Chan, C. T. Y. & Morcos, F. Coevolutionary methods enable robust design of modular repressors by reestablishing intra-protein interactions. Nat. Commun. 12, 5592 (2021).
Article CAS PubMed PubMed Central Google Scholar
Mutalik, V. K. et al. Precise and reliable gene expression via standard transcription and translation initiation elements. Nat. Methods 10, 354–360 (2013).
Article CAS PubMed Google Scholar
Khlebnikov, A., Datsenko, K. A., Skaug, T., Wanner, B. L. & Keasling, J. D. Homogeneous expression of the PBAD promoter in Escherichia coli by constitutive expression of the low-affinity high-capacity araE transporter. Microbiology 147, 3241–3247 (2001).
Article CAS PubMed Google Scholar
Stiffler, M. A., Subramanian, S. K., Salinas, V. H. & Ranganathan, R. A protocol for functional assessment of whole-protein saturation mutagenesis libraries utilizing high-throughput sequencing. J. Vis. Exp. 113, e54119 (2016).
Warren, D. J. Preparation of highly efficient electrocompetent Escherichia coli using glycerol/mannitol density step centrifugation. Anal. Biochem. 413, 206–207 (2011).
Article CAS PubMed Google Scholar
Magoc, T. & Salzberg, S. L. FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics 27, 2957–2963 (2011).
Article CAS PubMed PubMed Central Google Scholar
Rognes, T., Flouri, T., Nichols, B., Quince, C. & Mahé, F. VSEARCH: a versatile open source tool for metagenomics. PeerJ 4, e2584 (2016).
Article PubMed PubMed Central Google Scholar
Bloom, J. D. Software for the analysis and visualization of deep mutational scanning data. BMC Bioinforma. 16, 168 (2015).
Article Google Scholar
Bank, C., Hietpas, R. T., Wong, A., Bolon, D. N. & Jensen, J. D. A Bayesian MCMC approach to assess the complete distribution of fitness effects of new mutations: uncovering the potential for adaptive walks in challenging environments. Genetics 196, 841–852 (2014).
Article PubMed PubMed Central Google Scholar
Stan Development Team. Stan Modeling Language Users Guide and Reference Manual, v. 2.26. (Stan Development Team, 2021).
Riddell, A., Hartikainen, A. & Carter, M. PyStan v. 3.0.0 (2021).
Kingma, D. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2014).
Abadi, M. et al. Tensorflow: a system for large-scale machine learning. In Proc. 12th USENIX Symposium on Operating Systems Design and Implementation 265–283 (USENIX Association, 2016).
Hagberg, A. A., Schult, D. A. & Swart, P. J. Exploring network structure, dynamics, and function using NetworkX. In Proc. 7th Python in Science Conference (eds. Varoquaux, G. et al.) 11–15 (2008).
Tareen, A. & Kinney, J. B. Logomaker: beautiful sequence logos in Python. Bioinformatics 36, 2272–2274 (2020).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank members of the Laub and Marks laboratories, A. Batchelor, C. McClune, J. Ingraham, A. Schoech and I. Cvijovic for helpful discussions. We thank A. Murray, N. Gauthier, T. Okubo, S. Sinai and N. Youssef for feedback on the manuscript and M. Stiffler for sharing protocols before publication. This work was supported by the Howard Hughes Medical Institute (M.T.L.), National Institutes of Health grant no. R01CA260415 (D.S.M.), Chan Zuckerberg Initiative CZI2018-191853 (D.S.M.), Ashford PhD fellowship (D.D.), Boehringer Ingelheim Funds PhD fellowship (D.D.), Fanny and John Hertz Fellowship (E.N.W.), National Institutes of Health NLM training grant no. T15LM007092 (A.G.G.), National Institutes of Health grant no. T32GM007753 (T.-L.V.L.), Jane Coffin Childs Memorial Fund for Medical Research fellowship (B.W.) and National Institutes of Health grant no. K99GM135536 (B.W.).

Author information

Authors and Affiliations

Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
David Ding & Michael T. Laub
Department of Systems Biology, Harvard Medical School, Boston, MA, USA
David Ding, Anna G. Green & Debora S. Marks
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Anna G. Green
Department of Pharmacology, UT Southwestern Medical Center, Dallas, TX, USA
Boyuan Wang
Harvard-MIT Division of Health Sciences and Technology, Harvard Medical School, Boston, MA, USA
Thuy-Lan Vo Lite
Program in Biophysics, Harvard University, Boston, MA, USA
Eli N. Weinstein
Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA, USA
Michael T. Laub

Authors

David Ding
View author publications
Search author on:PubMed Google Scholar
Anna G. Green
View author publications
Search author on:PubMed Google Scholar
Boyuan Wang
View author publications
Search author on:PubMed Google Scholar
Thuy-Lan Vo Lite
View author publications
Search author on:PubMed Google Scholar
Eli N. Weinstein
View author publications
Search author on:PubMed Google Scholar
Debora S. Marks
View author publications
Search author on:PubMed Google Scholar
Michael T. Laub
View author publications
Search author on:PubMed Google Scholar

Contributions

D.D., D.S.M. and M.T.L conceived the project and wrote the paper. D.D. designed and performed experiments, analysed data and built the quantitative models. A.G.G. performed covariation analysis for ~350 protein–protein interactions. B.W. helped with library transformations. T.-L.V.L. created the combinatorial antitoxin mutant library. E.N.W. suggested helpful tips on Bayesian modelling. D.S.M. and M.T.L. supervised the project.

Corresponding author

Correspondence to Michael T. Laub.

Ethics declarations

Competing interests

D.S.M. is an advisor for Dyno Therapeutics, Octant, Jura Bio, Tectonic Therapeutics and Genentech and a cofounder of Seismic. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Ecology & Evolution thanks Hsin-Hung Chou and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Orthogonal validation of growth rate inference, structural explanation for antitoxin mutation effects, and covariational signal between toxin–antitoxin ParE3/ParD3.

a, Comparison of growth rates inferred by high-throughput vs. individual growth measurement. X axis error bars indicate + /− 2x standard deviation derived from n = 10 or n = 11 technical plate reader replicates (see Methods). Y axis error bars indicate 95% posterior highest density interval. The Pearson correlation coefficient (r) is indicated. b, Raw log-read ratio reproducibility between replicates (+1 pseudocount) for all single and double mutants. The Pearson correlation coefficient (r) is indicated. c, Mean mutation effect of residues in the C-terminal α-helix 3 of the ParD3 antitoxin indicates that residues facing the toxin are more susceptible to mutations that disrupt the ParD3–ParE3 interaction, producing negative Δgrowth rate values. d, Mean mutation effect in the N-terminal oligomerization region of the antitoxin highlights residues susceptible to disrupting the ParD3–ParE3 interaction when mutated. Cartoon illustrates arrangement of ParE3–ParD3 octamer observed in the co-crystal structure (PDB: 5CEG). One of the 4 antitoxin monomers is coloured by the mean mutation effect. e, Top 10 toxin–antitoxin covarying residue pairs indicated for reference. f, The 90% precision cutoff yields 29 toxin–antitoxin covarying residue pairs (black in upper, right quadrant) of which 28 pairs fall within toxin–antitoxin interface residues that are < 6 Å minimum atom distance (ochre dots) in the ParE3-D3 crystal structure (PDB ID: 5CEG).

Extended Data Fig. 2 Titration of toxin and antitoxin expression levels, and sensitive identification of toxin substitutions which do not disrupt toxicity.

a, Cartoon illustration of the expression system. IPTG induces antitoxin, arabinose induces toxin. b, Growth rate of cells harbouring wild-type toxin ParE3 without antitoxin at different arabinose induction levels in arabinose-titratable E. coli strain BW27783. c, d, Growth rate of cells harbouring wild-type toxin–antitoxin ParE3/ParD3 under different antitoxin induction levels modulated with IPTG and 0.00012% arabinose induction (c) or 0.0008% arabinose induction (d). e, Distribution of ∆growth rates(T*-T) for all toxin single substitutions under different arabinose inducer concentrations, with positive ∆growth rate(T*-T) values indicating loss of toxin function. The set of ‘most toxic’ toxin substitutions (n = 310) is coloured in light blue, the set of ‘toxic’ substitutions (n = 781) is coloured in green (see Methods). Other classes of substitutions are indicated. The dynamic range (difference between 0 and the truncated toxin mutants) shrinks, as expected, for lower expression levels that do not fully inhibit growth with the wild-type toxin, and a higher fraction of mutants show loss of toxicity (higher ∆growth rates) under lower expression conditions. The toxin substitution A28Q is highlighted (dark blue) as an example that shows no growth rate difference relative to wild-type toxin at high expression conditions, but is not as toxic as wild-type toxin at lower expression conditions. f, Schematic illustrating loss of toxicity detection using growth rate measurements in different expression regimes. g, Mean ∆growth rates(T*-T) of residue positions mapped onto the ParE3 toxin structure. Values shown for 0.00012% [arabinose] inducer. h, The mean ∆growth rates(T*-T) of a residue are correlated with the relative solvent accessibility of the residue (Pearson r = −0.66). Values shown for 0.00012% [arabinose] inducer. i,j, Distribution of ∆growth rate(T*-T) for all toxin substitutions (black) or top 10 coevolving residue substitutions (purple) in the toxin in absence of antitoxin (g) or presence of antitoxin (h). Values shown for 0.00012% [arabinose] inducer, and antitoxin is induced with 10 µM IPTG. k, The ∆growth rate(T*-T) values of each substitution at any position along the toxin ParE3. Green highlights the top 10 covarying positions between toxin and antitoxin in natural homologues. Values shown for 0.00012% [arabinose] inducer.

Extended Data Fig. 3 Volcano plot visualizing significant and substantial beneficial toxin variants in different antitoxin backgrounds, and beneficial toxin variants in various antitoxin backgrounds under ‘high’ and ‘low’ antitoxin expression conditions.

a, For each deleterious antitoxin variant background, the mean posterior change in the number of doublings, ∆growth rate(T*/AT* - T/AT*), of the most toxic toxin mutants are plotted vs. their significance (-log10(p(∆growth rate<0))) of deviation from the AT* single mutation. This is based on 10,000 discrete samples of the posterior ∆growth rate(T*/AT* - T/AT*) values inferred from the hierarchical Bayesian inference model (see Methods). Vertical line: +0.5 ∆growth rate, horizontal line: p(∆growth rate>0) = 0.0001. Red indicates significant and substantial beneficial toxin substitution using this cutoff. Experiments performed under ‘high antitoxin’ expression conditions. b, The minimum atom distance from a given deleterious antitoxin residue to each beneficial toxin is plotted vs. ∆growth rate(T*/AT* - T/AT*). Experiments performed under ‘high antitoxin’ expression conditions. c, The minimum atom distance from a given deleterious antitoxin residue to each beneficial toxin is plotted vs. ∆growth rate(T*/AT* - T/AT*). Experiments performed under ‘low antitoxin’ expression conditions. d, Distance vs. ∆growth rate(T*/AT* - T/AT*) of beneficial toxin variants for all deleterious antitoxin variant backgrounds. Experiments performed under ‘low antitoxin’ expression conditions. Values for (b-d) shown for double mutants with ∆growth rate effect size > +0.5 and p(∆growth rate>0) < 0.0001.

Extended Data Fig. 4 A non-specific, nonlinear model can explain most of the observed single and double-mutant growth rates.

a, Schematic of nonlinear, non-specific model: double-mutant expected growth rates (brown) are based on the independent (non-specific) sum of underlying toxin and antitoxin mutant effects, passed through a sigmoid function (yellow). b,c, Residuals for nonlinear, non-specific model (b) or linear non-specific model of the same structure without a non-linearity (c) showing unbiased residuals for the nonlinear model, but a complete misfit of the linear model. Model built using ‘high antitoxin’ expression levels. Explained variance (R2) is indicated. Significant and substantially positively (dark green) or negatively (green) deviating mutations are shown in (b) (see Methods). d, Inferred independent toxin single-substitution effects among the set of most toxic toxin mutants demonstrating a tail of independently beneficial toxin variants. Experiment performed under ‘high antitoxin’ expression levels. e,f, Nonlinear independent model fit to growth rates measured under ‘high antitoxin’ (e) or ‘low antitoxin’ (f) expression conditions. The wild-type toxin -antitoxin pair is inferred to be differently close to the sigmoid ‘cliff’ between expression conditions. g, Cartoon illustrating different detection of single-mutant effects depending on expression conditions. h-j, Correlation of inferred single-mutant effects (h), observed single-mutant ∆growth rate(T*/AT* - T/AT) effects (i), and double-mutant deviations of observed from expected growth rates (j) from separate inference under ‘high antitoxin’ (x axis) or ‘low antitoxin’ (y axis) expression conditions.

Extended Data Fig. 5 Deviation of observed from expected double-mutant growth rates reveals toxin variants with specific or with only non-specific beneficial effects, and fraction of specific vs. non-specific toxin variants.

a, For each beneficial toxin mutation (indicated above each plot) combined with each antitoxin variant indicated on the x axis, the plot shows the growth rate relative to the wild-type toxin–antitoxin pair (mean posterior ∆growth rate(T*/AT* - T/AT)). Grey dots represent T*/AT*, error bars indicate 95% posterior highest density interval. The ∆growth rate for each antitoxin mutant combined with wild-type toxin (T/AT*) is shown (black dots) along with the ∆growth rate for T*/AT* expected under the non-specific, nonlinear model (green dots). b, Deviation of the observed (dots) from the expected double-mutant growth rates (orange line) highlights classification of specific and non-specific toxin variants. Beneficial toxin substitutions (rows, n = 32) ordered by their range of growth rate deviations across deleterious antitoxin variants as in panel b. c-g, Specific vs. non-specific enabling toxin variants under ‘high’ antitoxin expression for all enabling toxin variants grouped by deleterious antitoxin for the more stringent set of 310 ‘most toxic’ toxins (c) and less stringent set of 781 ‘toxic’ toxins (d). Orange and purple indicate mutant pairs involving non-specific and specific, respectively, rescuing mutations in the toxin. Enabling toxin variants under ‘low’ antitoxin expression at different absolute growth rate cutoffs relative to the wild-type toxin/antitoxin growth rate (e), or grouped by ‘most toxic’ (f) or ‘toxic’ (g) toxin variants. h, Inferred non-specific toxin variant effect vs. minimum atom distance to any antitoxin atom for 21 non-specifically rescuing toxin variants (orange). i, j, For specific and non-specific beneficial toxin mutants, the change in growth rate in a deleterious antitoxin mutant background, ∆growth rate (T*/AT* - T/AT*), is plotted vs. minimum atom distance to the deleterious antitoxin mutation it rescues (i) or any antitoxin atom (j) in the ‘low antitoxin’ expression condition.

Extended Data Fig. 6 Natural sequence statistics, EVcouplings or DeepSequence models are not predictive of beneficial toxin substitution effects.

a, Distribution of number of specific and non-specific beneficial toxin substitutions (purple) vs. all possible toxin variants (grey) observed in natural sequences. b, Frequency distribution of beneficial toxin and deleterious antitoxin mutant pairs in natural sequences, with 29/51 pairs never observed. c-e, Effect size of toxin variant rescue vs. frequency of variant pair in natural sequences (c), conditional frequency of toxin variant given natural sequences containing the particular deleterious antitoxin substitution (d), or enrichment of beneficial toxin variant in natural sequences containing the deleterious antitoxin substitution (e). f-g, EVcouplings model inferred site-wise toxin mutant preferences (h_i) vs. toxin mutant effect inferred in suppressor scan with the Pearson correlation coefficient indicated (f), or EVcouplings pairwise T*/AT* variant preference (J_ij) vs. effect size of beneficial toxin mutation effect in a deleterious antitoxin variant background (g). h, Scatterplot of observed beneficial toxin effect in deleterious antitoxin mutant backgrounds (AT*), vs EVmutation (top row) or DeepSequence (variational auto-encoder) mutation effect predictions (bottom row). Pearson correlation (r) is indicated. i, Distribution of natural sequence identity fractions across the alignment. Different histograms illustrate fraction mutated for homologues containing the full concatenated toxin and antitoxin (grey), the toxin homologues only (blue), or the antitoxin homologues only (turqouise).

Extended Data Fig. 7 Non-specific suppressor toxin ParE3 variants are as or almost as toxic as wild-type ParE3, and reproducibility of antitoxin combinatorial variant log-read ratios.

a, Growth rates of ParE3 non-specific suppressor toxin variants (blue) compared to wild-type toxin ParE3 without antitoxin (black) and wild-type toxin and antitoxin (grey) under fully inhibitory toxin expression conditions (0.00012% [arabinose]) or half-maximal inhibitory expression conditions (0.00006% [arabinose]). Dark lines represent the mean OD600, shaded regions show standard deviation of the replicates (n = 10 or n = 11). b, Raw log-read ratio reproducibility between biological replicates (+1 pseudocount) for the combinatorial antitoxin library (8000 amino acid variants) in different toxin mutant backgrounds. Specific classes of antitoxin mutants, and Pearson correlation coefficients (r) are indicated.

Extended Data Fig. 8 Bayesian hierarchical model.

a, Simplified description of the Bayesian hierarchical model. Pre- and post-selection reads for each codon are drawn from a Poisson distribution. The log-ratios of these Poisson parameters are not fixed between synonymous codons but are instead drawn from a normal distribution, whose mean forms the amino acid mutant growth rate of interest. This model allows for different synonymous codons to inform each other as well as the amino acid mutant growth rate without being completely fixed. b, Full plate diagram description of the hierarchical Bayesian model capturing both replicates. Replicate index i takes values 1 or 2, amino acid index m takes on values ranging from 1-2040 (20*102) for the toxin or 1-1840 (92 * 20) for the antitoxin, codon index n takes on values ranging from 1-6426 (63*102) for the toxin or 1-5796 (63*92) for the antitoxin. Circles indicate random variables, grey circles represent observed random variables. c, Description of variables, likelihood function and priors used. The likelihood function incorporates maximum entropy distributions for the observed variables, and the priors incorporate computationally tractable, vague priors for the amino acid substitution growth rates. The relative priors on the standard deviation of replicate σ_rep_n vs. synonymous variant σ_syn_m reflect our prior belief that replicate experiment noise is larger than synonymous mutant noise. σ_b_i and r_scale have improper priors.

Extended Data Fig. 9 Validation of Bayesian growth rate inference on synthetic datasets.

a, Three different true synthetic growth rate distributions used for simulating pre- and post-selection codon variant read count data. Synthetic growth rate distributions were chosen from observed toxin single-mutant growth rate distributions in 3 different antitoxin backgrounds, spanning the range of distributions observed. b,c, Inferred growth rates using the Bayesian hierarchical model (b) show less bias and incorporate uncertainty estimates compared to mean log-read ratio summary of pre-and post-selection read counts (+1 pseudocount) (c). Error bars in panel b reflect the 95% highest density posterior intervals, with the measure of centre being the mean posterior growth rate. d, Model uncertainties accurately reflect deviations of inferred true growth rates. Percentage of true synthetic amino acid growth rates falling into a certain highest density interval among all 2040 simulated toxin amino acid variants.

Extended Data Fig. 10 Posterior predictive checks show that the Bayesian hierarchical model can capture observed data statistics for both replicate experiments, whereas a non hierarchical model cannot.

a,b, A non-hierarchical model, in which all synonymous codon variants have the same growth rate (a), cannot explain the observed data. (b) The observed standard deviation of log-read ratios for synonymous wild-type toxin codon variants (red) (n = 278) fall outside of the non-hierarchical model’s expectations (grey). c, The synonymous amino acid mutant standard deviations within a replicate (y axis) are higher than codon mutant standard deviations between replicates (x axis). Light green indicates binned average. d, Bayesian hierarchical model allows for growth rate variation between synonymous codon mutants by drawing these from a Gaussian distribution. e-g, Observed data statistics fall within the hierarchical Bayesian model’s expected values. (e) The observed standard deviation of synonymous wild-type toxin codon mutant log-read ratios (red) fall within the model simulated values (stdev(log(c_post1_k/c_pre1_k) or stdev(log(c_post2_k/c_pre2_k) for biological replicate 1 or 2 respectively), see model code). Compare to panel (b) for the non-hierarchical model. (f) For each codon mutant, the hierarchical Bayesian model allows for simulating pre- and post-selection read counts (log(c_post_i,n/c_pre_i,n), see ED Fig. 9), including log-read ratios, using the posterior parameter distribution. For each codon mutant, we calculate the p-value statistic (ie. the fraction of simulated samples falling below the observed log-read ratio). (g) Distribution of posterior simulated p-values for various statistics, demonstrating that no observed data statistic is biased to fall outside of the posterior simulated statistics.

Supplementary information

Reporting Summary

Peer Review Information

Supplementary Tables

Supplementary Tables: 1, Spatial distances of rescuing toxin substitutions to the antitoxin; 2, Strains created in this study; 3, Primers used in this study.

Supplementary Data

Location of beneficial toxin substitutions on the crystal structure.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ding, D., Green, A.G., Wang, B. et al. Co-evolution of interacting proteins through non-contacting and non-specific mutations. Nat Ecol Evol 6, 590–603 (2022). https://doi.org/10.1038/s41559-022-01688-0

Download citation

Received: 25 September 2021
Accepted: 31 January 2022
Published: 31 March 2022
Version of record: 31 March 2022
Issue date: May 2022
DOI: https://doi.org/10.1038/s41559-022-01688-0

This article is cited by

Cryptic genetic variation shapes the fate of gene duplicates in a protein interaction network
- Soham Dibyachintan
- Alexandre K. Dubé
- Christian R. Landry
Nature Communications (2025)
Widespread epistasis shapes RNA polymerase II active site function and evolution
- Bingbing Duan
- Chenxi Qiu
- Craig Kaplan
Nature Communications (2025)
EvoWeaver: large-scale prediction of gene functional associations from coevolutionary signals
- Aidan H. Lakshman
- Erik S. Wright
Nature Communications (2025)
Understanding the physiological role and cross-interaction network of VapBC35 toxin-antitoxin system from Mycobacterium tuberculosis
- Neelam Singh
- Gopinath Chattopadhyay
- Ramandeep Singh
Communications Biology (2025)
Protein design using structure-based residue preferences
- David Ding
- Ada Y. Shaw
- Debora S. Marks
Nature Communications (2024)