Fig. 2: Cas12a subtypes discovered from metagenomic data. | Nature Communications

Fig. 2: Cas12a subtypes discovered from metagenomic data.

From: Discovery of CRISPR-Cas12a clades using a large language model

Fig. 2

a Phylogenetic tree of Cas12 proteins. The identified Cas12a proteins in this work were highlighted in red in the Cas12a family. b Cas12a subtypes with different combinations of accessory proteins, i.e., Cas4, Cas1, and Cas2. c Statistics of Cas12, Cas1, Cas2, and Cas4 from 300 CRISPR-loci, which were verified manually. The features of the first 1000 CRISPR-loci were analyzed in Supplementary Fig. 5. d Statistics of subtypes in the 300 CRISPR-loci. e Sequence length variation in different subtypes. DNA sequence length was calculated from the start codon of the Cas12a gene to the end of the first repeat. f Statistics of spacers in different subtypes. g Sequence alignment of direct repeats in the 300 CRISPR-loci. The sequence corresponding to the stem loop region of crRNA was highlighted with a gray background. h Distribution of Cas proteins in different subtypes and species. The subtypes were colored in the inner circle. The species were labeled in the outer circle. Error bar indicates mean ± s.e.m. measured from three technical replicates. n = 3. Statistical significance was assessed using one-way ANOVA analysis. The symbol ‘#’ indicated that the metagenomes in the corresponding subtypes did not contain spacer sequences. Source data are provided as a Source Data file.

Back to article page