Fig. 3: Expansion of Cas9 PAM diversity using CICERO predictions.
From: Uncovering Cas9 PAM diversity through metagenomic mining and machine learning

a Expansion of Cas9 PAM predictions using CICERO-650M. The initial 8003 PAMs bioinformatically inferred from metagenomic datasets (gray) are extended to over 50,000 via CICERO-650M (colored). Each color represents a confidence bin of predicted PAMs. Notably, over 60% of the unlabeled Cas9 sequences received predictions with a confidence score above 0.7. b Phylogenetic tree of Cas9 protein clusters with consensus PAM inferred from metagenomic datasets or predicted with high-confidence (confidence score > 0.7) by CICERO. Annotations from the inner to outer rings represent the most likely nucleotide at each of the 10 PAM positions. The outermost ring indicates the source of the PAM prediction—inferred PAM (CRISPR-PAMdb) or CICERO—and, for CICERO-derived predictions, the associated confidence score. Source data are provided as a Source Data file.