Extended Data Fig. 1: Bioinformatic identification and conservation of II-C Cas9s.
From: Pro-CRISPR PcrIIC1-associated Cas9 system for enhanced bacterial immunity

a, Pipeline of identifying novel Cas9 systems from self-collected dataset. b, Length distribution of Cas9s in HBGC dataset. NAG+ Cas9s are denoted as squares in various colours and NAG− Cas9s are denoted as circles in blue. Dashed line marks the size of the smallest NAG+ Cas9 with 1,400 aa. c, Box plot illustrating protein size variation of Cas9s within different types in GEM/HBGC and Makarova NCBI datasets. Protein integrity within each dataset was manually checked with AlphaFold prediction. The number of Cas9s within each group is labelled above each box. The box plots show the minima, maxima, centre, bounds of box and whiskers and percentile with all data points. d, Primary sequence conservative analysis of II-C Cas9s. Identified II-C Cas9s were aligned to CjCas9 to calculate the conservation ratio of each amino acid. Amino acids of CjCas9 with conservation ratio above 0.98 are labelled. Specifically, catalytic amino acids of HNH and RuvC are coloured in red and blue, respectively. They are D8, E479, and D710 in the RuvC domain and H559, N573, and N582 in the HNH domain referenced by CjCas9. The red dashed line marks 0.98 (high conservation) and the black dashed line marks 0.2 (low conservation). e, Left, structural alignment of full-length Cas9 proteins, indicating HNH domains exhibit highly flexible orientation. Right, structural alignment of extracted stand-alone HNH domains, indicating a conserved structure of HNH domains in Cas9 proteins.