Extended Data Fig. 3: Structural composition of generated CRISPR-Cas proteins. | Nature

Extended Data Fig. 3: Structural composition of generated CRISPR-Cas proteins.

From: Design of highly functional genome editors by modelling CRISPR–Cas sequences

Extended Data Fig. 3: Structural composition of generated CRISPR-Cas proteins.The alternative text for this image may have been generated using AI.

Generated and natural CRISPR-Cas proteins were clustered at 70% identity using MMseqs260. For both generated and natural proteins, random representatives of the largest 5,000 clusters were selected for structural analysis. Structures were predicted using the ColabFold61 implementation (v1.5.2) of AlphaFold230 using multiple sequence alignments (MSAs) from the ColabFoldDB (no templates). a) Generated sequences yield high confidence AlphaFold2 structure predictions despite significant sequence divergence from natural proteins. b) Predicted structures for generated proteins align well to experimentally determined structures from the PDB. c) Using Foldseek62 (v8.ef4e960), predicted structures for generated and natural proteins were searched against the SCOPe database31 (v2.08). Points in the left plot represent the fraction of generated (green) and natural (gray) proteins containing the twenty most commonly observed SCOPe families among both generated and natural sequences. Distributions in the right plot show the sequence identity over aligned residues between the generated and natural proteins and the best-matching SCOPe family structure. Overall, generated proteins were composed of similar structural components as compared to natural proteins, with levels of per-domain sequence similarity to particular SCOPe families being similar between the two sets of sequences. d) Examples of four most frequently observed SCOPe families (brown) aligned to generated (green) and natural (gray) proteins.

Back to article page