Fig. 1: The fold space covered by the microbial protein structure universe is continuous. | Nature Communications

Fig. 1: The fold space covered by the microbial protein structure universe is continuous.

From: Sequence-structure-function relationships in the microbial protein universe

Fig. 1: The fold space covered by the microbial protein structure universe is continuous.The alternative text for this image may have been generated using AI.

a Flowchart of our process to arrive at ~200,000 de novo protein models covering a diverse sequence space. b The sequence length distribution shows that our sequences are shorter than many of the proteins in the PDB, CATH or AlphaFold databases, as expected. We predicted structures between 40 and 200 residues long, which covers the majority of length distributions in microbial proteins, which are often shorter than eukaryotic sequences. c The protein structure universe in UMAP space is color-coded according to features, such as similarity to CATH classes, sequence length, number of helical transmembrane spans, and relative contact order. d Novel folds (blue dots) are spread throughout the fold space with fewer representatives in the purely α-helical and purely β-sheet folds. [Icons in panel (a) were created by Ronald Vermeijs and Maxim Kulikov for the Noun Project, licensed under the Creative Common license CCBY3.0]. Source data for this figure are provided in the source data file.

Back to article page