Extended Data Fig. 1: Bioinformatic and coevolutionary analysis of two-component signalling systems.
From: Engineering orthogonal signalling pathways reveals the sparse occupancy of sequence space

a, Schematic illustrating the challenge of identifying HK*–RR* pairs that are orthogonal to all endogenous histidine kinases and response regulators. For both histidine kinases and response regulators, the specificity-determining residues define a finite sequence space. The specificity-determining residues of each histidine kinase determine the set of response regulators with which it can interact. These sets, or niches in sequence space, are depicted as ovals, and each cognate response regulator is represented by a black dot (bottom left). A similar representation is shown for each response regulator and the set of histidine kinases with which it can interact (top right). The two sequence spaces are connected, as depicted with coloured cones for a single histidine kinase–response regulator pair. The establishment of a new signalling pathway that is orthogonal to existing systems requires that the two new proteins are compatible with each other, but occupy regions of histidine-kinase and response-regulator specificity space that are incompatible with all of the paralogues that are already present. b, Schematic summarizing the endogenous two-component pathways in E. coli to which a new orthogonal pathway must avoid crosstalk. c, Diagram of the DHp domain of the histidine kinase TM0853 (blue) in complex with its cognate response regulator RR0468 (green). Residues that dictate specificity, and which were randomized in our libraries, are space-filled in orange (kinase) and red (substrate). d, Plot summarizing the number of histidine kinases and response regulators in bacterial genomes. e, Visualization of the GREMLIN model, representing the coevolutionary dependencies between the residues of cognate histidine kinases and response regulators. Blue nodes indicate PhoQ residues, green nodes indicate PhoP residues and the darker nodes are the 11 residues that were randomized in the dual PhoQ–PhoP library. Edge widths indicate the strength of the coevolutionary signal, and the node size of each residue represents the total coevolutionary signal to residues on the other protein.