An updated evolutionary classification of CRISPR–Cas systems including rare variants

Makarova, Kira S.; Shmakov, Sergey A.; Wolf, Yuri I.; Mutz, Pascal; Altae-Tran, Han; Beisel, Chase L.; Brouns, Stan J. J.; Charpentier, Emmanuelle; Cheng, David; Doudna, Jennifer; Haft, Daniel H.; Horvath, Philippe; Moineau, Sylvain; Mojica, Francisco J. M.; Pausch, Patrick; Pinilla-Redondo, Rafael; Shah, Shiraz A.; Siksnys, Virginijus; Terns, Michael P.; Tordoff, Jesse; Venclovas, Česlovas; White, Malcolm F.; Yakunin, Alexander F.; Zhang, Feng; Garrett, Roger A.; Backofen, Rolf; van der Oost, John; Barrangou, Rodolphe; Koonin, Eugene V.

doi:10.1038/s41564-025-02180-8

Download PDF

Analysis
Open access
Published: 06 November 2025

An updated evolutionary classification of CRISPR–Cas systems including rare variants

Nature Microbiology volume 10, pages 3346–3361 (2025)Cite this article

32k Accesses
25 Citations
52 Altmetric
Metrics details

Subjects

Abstract

The known diversity of CRISPR–Cas systems continues to expand. To encompass new discoveries, here we present an updated evolutionary classification of CRISPR–Cas systems. The updated CRISPR–Cas classification includes 2 classes, 7 types and 46 subtypes, compared with the 6 types and 33 subtypes in our previous survey 5 years ago. In addition, a classification of the cyclic oligoadenylate-dependent signalling pathway in type III systems is presented. We also discuss recently characterized alternative CRISPR–Cas functionalities, notably, type IV variants that cleave the target DNA and type V variants that inhibit the target replication without cleavage. Analysis of the abundance of CRISPR–Cas variants in genomes and metagenomes shows that the previously defined systems are relatively common, whereas the more recently characterized variants are comparatively rare. These low abundance variants comprise the long tail of the CRISPR–Cas distribution in prokaryotes and their viruses, and remain to be characterized experimentally.

Structure-based functional mechanisms and biotechnology applications of anti-CRISPR proteins

Article 04 June 2021

The hidden risks of CRISPR/Cas: structural variations and genome integrity

Article Open access 05 August 2025

Structural biology of CRISPR–Cas immunity and genome editing enzymes

Article 13 May 2022

Main

Clustered regularly interspaced short palindromic repeats (CRISPR)–CRISPR-associated protein (Cas) systems are best known as the new generation of genome engineering tools^1,2. However, their primary natural role is adaptive immunity in bacteria and archaea that functions by recognizing and, typically, cleaving a specific sequence in the target DNA or RNA that is complementary to a unique spacer between CRISPR repeats. The CRISPR–Cas immune response includes three main stages: adaptation, expression and interference, which are discussed in many dedicated reviews covering various aspects of the molecular mechanisms of CRISPR–Cas functionality^3,4,5,6,7. CRISPR–Cas systems have a characteristic modular organization that roughly corresponds to the three stages of adaptive immunity (Fig. 1; details in Supplementary Note).

**Fig. 1: Modular organization of CRISPR–Cas systems.**

As expected of defence mechanisms, CRISPR–Cas systems show extensive diversity of the organization of their respective genome loci, cas gene composition as well as domain architectures and sequences of Cas proteins^3,4,8,9. In the nearly two decades since the discovery of the CRISPR–Cas function¹⁰, knowledge and understanding of this remarkable system has been steadily expanding through the mining of rapidly growing genomic and metagenomic databases. Classification of CRISPR–Cas systems based, to the maximum extent possible, on evolutionary relationships is essential for accurate description and further characterization of CRISPR–cas loci in newly sequenced bacterial and archaeal genomes and metagenomes, and hence, for the progress of the entire field of CRISPR research. However, CRISPR–Cas systems share no universal markers suitable for comprehensive phylogenetic analyses, and both the organization of CRISPR–cas loci and the Cas proteins themselves evolve fast, which makes the construct of a consistent robust classification a major challenge. Three previous versions of CRISPR–Cas classification published in 2011, 2015 and 2020 (refs. ^8,11,12) employed a complex polythetic approach. For the purpose of classification, comparisons the architecture and gene composition of CRISPR–cas loci were combined with clustering by sequence similarity and phylogenetic analysis of conserved Cas proteins, such as Cas1, the integrase that plays the key role in the adaptation stage. CRISPR–Cas types were delineated based on unique effector modules, whereas subtypes were defined less formally, based on a combination of the above criteria. A complementary development has been reported recently: Cas Protein Effector Database of Information and Assessment (CasPEDIA), a comprehensive classification of class 2 Cas enzymes based on their activity and target specificity¹³.

The 2020 CRISPR–Cas system classification included 6 types and 33 subtypes partitioned between two classes that differ in the architectures of their effector modules involved in CRISPR RNA (crRNA) processing and interference⁸. Since then concerted efforts have been undertaken to identify additional CRISPR–Cas systems and to decipher their evolutionary origins, resulting in the discovery of type VII, 13 additional subtypes and numerous unique variants (Extended Data Figs. 1–4). Compared with previously identified CRISPR–Cas systems, these recently discovered systems are rare, apparently coming from the long tail of the CRISPR–Cas distribution among prokaryotes that remains to be further explored. In addition, many previously unrecognized CRISPR-linked genes have been discovered and the mechanisms of several CRISPR–Cas subtypes have been elucidated. In this Analysis we describe the latest updates to the CRISPR–Cas classification and discuss the prospects of further discoveries being made as well as the current understanding of the main routes of the evolution of CRISPR–Cas systems.

Results

Updated classification of CRISPR–Cas systems

Distinct features in class 1 CRISPR–Cas systems

Although numerous class 1 CRISPR–Cas systems have been discovered since the 2020 classification, only type VII and two subtypes (III-G and III-H) have been explicitly added to the classification⁹. While working on this classification update, we identified a distinct variety of type III CRISPR–Cas systems, which seems to qualify as subtype III-I, bringing the total number of subtypes in class 1 to 21 (eight in type I, nine in type III, three in type IV and one in type VII (Figs. 1 and 2a, Extended Data Figs. 1 and 2, and Supplementary Note).

**Fig. 2: New class 1 CRISPR–Cas systems.**

The previously undescribed type VII is represented by CRISPR–Cas systems found mostly in several taxonomically diverse archaeal genomes and containing a metallo-β-lactamase (β-CASP) effector nuclease^9,14. According to the CRISPR–Cas classification principles, the unique signature effector shared by these systems, designated Cas14, qualifies these loci as a new type. Cas14 is encoded in a predicted operon with Cas7 and Cas5, the subunits of the effector complex, and in some cases Cas6, which is a dedicated nuclease involved in crRNA processing in other class 1 systems (Fig. 2a). Type VII loci lack adaptation modules and repeats in the associated CRISPR array often contain multiple substitutions, suggesting that the arrays do not frequently incorporate new spacers. Analysis of the limited number of spacer hits indicates that these systems target transposable elements⁹. Notably, in addition to the β-CASP domain, the Cas14 protein contains a carboxy-terminal domain that structurally resembles the C-terminal domain of Cas10, the large subunit of the type III effector module, suggesting an evolutionary connection between types III and VII (Fig. 2a). This connection is also supported by the specific similarity between the Cas5 proteins of type VII and subtype III-D⁹. The catalytic residues of Cas7 that are required for the target RNA cleavage in type III systems are not conserved in type VII. Instead, type VII systems have been shown to target RNA in a crRNA-dependent manner, cleaving the target via the nuclease activity of Cas14 (Supplementary Table 1). Type VII systems seem to be simple and have probably evolved from type III via the reductive route as discussed below. However, the recently solved cryogenic-electron-microscopy structure of the type VII effector complex contains up to 12 subunits, with Cas14 binding to the Cas7 backbone via its Cas10 remnant domain, making this complex one of the largest among the class 1 systems¹⁴ (Fig. 2a and data file 1 in ref. ¹⁵) https://doi.org/10.5281/zenodo.15882620.

The three previously undescribed subtypes of type III CRISPR–Cas systems—that is, III-G (Sulfolobales-specific, previously reported as unclassified⁸), III-H (present in various archaea and a few bacterial metagenome-assembled genomes (MAGs)) and III-I (present in more than 160 genomes in the National Center for Biotechnology Information (NCBI) non-redundant (NR) database, mostly from the phyla Thermodesulfobacteriota and Chloroflexota)—are not closely related but share some features that suggest reductive evolution⁹ (Fig. 2a). In subtypes III-G and III-H, the polymerase/cyclase domain of Cas10, the large subunit of the effector complex, is inactivated as indicated by the replacement of the catalytic amino acids. The lost capacity to generate cyclic oligoadenylates (cOAs) correlates with the loss of genes encoding ancillary proteins containing a cOA-binding domain (such as CRISPR-associated Rossmann fold (CARF) or SMODS (second messenger oligonucleotide or dinucleotide synthetase)-associated and fused to various effector domains (SAVED)) fused to an effector domain, such as higher eukaryotes and prokaryotes nucleotide-binding (HEPN) RNase or other effectors that are typical of type III CRISPR–Cas systems⁸. Thus, these subtypes have lost the cOA signalling pathway that induces collateral RNase activity in most type III systems. Although subtype III-H is distantly related to III-F, a unique feature of III-H systems is a highly diverged small subunit (Cas11) that apparently has replaced the C-terminal domain of Cas10 (Fig. 2a). In subtype III-G, Csx26, the signature protein of this subtype, might replace Cas11 in the effector complexes (Fig. 2b and Extended Data Fig. 5). Both these new subtypes lack adaptation modules and no CRISPR array has so far been found in III-G loci, suggesting that this system recruits crRNAs from other CRISPR–cas loci in trans. Given the lack of conservation of catalytic aspartates in Cas7 and the presence of apparently active HD-nuclease domains in Cas10, both III-G and III-H are predicted to cleave DNA targets (Fig. 2a, Extended Data Fig. 2a and Supplementary Table 1).

The effector module of the new subtype III-I systems (Fig. 2a,b, Extended Data Figs. 2 and 6 and data file 2 in ref. ¹⁵) discovered during the present analysis has two unique features: (1) an extremely diverged Cas10 lacking the amino-terminal polymerase/cyclase domain; this protein lacks detectable sequence similarity to Cas10 but was confidently identified as a Cas10 homologue in the structure similarity search (distance-matrix alignment (DALI) Z-score = 10.9); and (2) a multidomain protein with a domain architecture resembling that of Cas7–11, the effector protein of subtype III-E, but apparently originating independently from a different variant of subtype III-D (Extended Data Fig. 6). The III-I effector protein consists of three fused Cas7 domains and a Cas11 domain that lacks both the N-terminal Cas7 domain and an insertion in the C-terminal Cas7 present in Cas7–11 (Extended Data Fig. 6). Based on the presence of conserved aspartates in each of the three Cas7 domains, the subtype III-I CRISPR–Cas system most probably cleaves RNA (data file 2 in ref. ¹⁵). We propose to denote the III-I effector Cas7-11i and accordingly amend the designation for the III-E effector to Cas7-11e.

In addition to type VII and the three previously undescribed subtypes, multiple variants of class 1 CRISPR–Cas systems with unique domain architectures and functional features have recently been discovered (Fig. 2c). Three of these variants (I-E2, I-F4 and IV-A2) encompass an HNH nuclease that is fused to Cas5, Cas8f and CasDinG proteins, respectively⁹. Robust crRNA-guided double-stranded DNA (dsDNA) cleavage activity has been demonstrated for each of these variants^9,16,17. Notably, the I-E2 and I-F4 variants typically lack Cas3 helicase-nuclease, which is responsible for the shredding of the DNA target in most type I CRISPR–Cas systems, so that the HNH nuclease seems to replace the nuclease activity of the HD-nuclease domain of Cas3. The HNH nuclease is typically encoded by mobile genetic elements (MGEs) such as group I self-splicing introns¹⁸. Cas9, the type II effector, also contains an HNH nuclease that is inserted into the other RuvC-like nuclease domain of Cas9 and is responsible for the target DNA strand (the strand that hybridizes to the crRNA during R-loop formation) cleavage. The discovery of the new HNH-encoding variants shows that this mobile nuclease has been coopted by different CRISPR–Cas systems on multiple independent occasions. However, unlike the case of Cas9 where the HNH nuclease is responsible for the cleavage of only one DNA strand in the dsDNA target, in the new variants HNH is the only effector nuclease that cleaves both strands^9,16,17. The IV-A2 variant is of special note, being the first type IV system shown to cleave the target. All other type IV-A and IV-B systems lack effector nucleases, apparently suppressing MGE reproduction via inhibition of transcription, at least in the case of IV-A^9,19. Type IV-C systems are predicted to target and cleave DNA via the HD domain but remain experimentally uncharacterized. Another new variant, I-E3, encompasses a distinct nuclease of the PD-(D/E)xK family that is fused to Cas11⁹. Some of these systems contain Cas3, including an apparently active HD-nuclease domain, so the role of the PD-(D/E)xK nuclease remains to be determined.

Another recurrent trend in CRISPR–Cas evolution is the recruitment of CRISPR–Cas systems by large transposons, enabling RNA-guided transposition. The CRISPR-associated Tn7-like transposons (CASTs) of subtypes I-F and I-B were introduced in the previous classification⁸ as I-F3 and I-B2 variants, respectively. More recently, three additional CAST varieties derived from subtypes I-D, I-C and IV-A^20,21,22 have been discovered and a I-A system was found to be associated with Mu-like transposons⁹ (Fig. 2d). Unlike most CASTs in which CRISPR–Cas effectors are inactivated (discussed later), some of the I-D CASTs are active, apparently representing the most recent capture of a CRISPR–Cas system by a transposon and, possibly, perform a dual function, acting both as a CAST and as a bona fide CRISPR–Cas system²¹. The I-A CASTs associated with Mu-like transposons have not yet been characterized experimentally and are expected to interact with the transposition machinery in a distinct manner because these transposons lack the TniQ–TnsD protein, an essential component of all previously discovered CASTs²¹. Finally, a distinct CAST variety (I-E5), also lacking TniQ–TnsD, was recently found in association with a distinct class of telomeric transposons²³. This CAST contains an active HD-nuclease domain in the Cas3 protein and might also perform a dual function. All CASTs, except for subtype IV-A, of which so far only one instance has been found, were included in the current classification as variants of the respective CRISPR–Cas subtypes (Fig. 2d and Extended Data Figs. 1 and 2).

Other rare derivatives of class 1 systems continue to be discovered, for example, a type IV variant with a cas core gene set of apparent chimaeric origin, with Cas7 derived from IV-B and Cas5 from IV-A, and associated with a RecD-like helicase^24,25, or a I-F variant that seems to be an intermediate between I-F1 and I-F2 variants (Extended Data Fig. 1). Despite low abundance, characterization of such variants can potentially shed light on CRISPR–Cas evolution and reveal novel functionalities.

Diversity of sensors and effectors in type III systems

Most of the type III CRISPR–Cas systems are markedly more complex than other types including a striking variety of accessory proteins^26,27. In the updated classification, type III is divided into nine subtypes, each of which encompasses a specific set of core effector genes and forms a distinct clade in the phylogenetic tree of Cas10 (Figs. 1 and 3a). The exceptions are subtype III-E, which lacks a cas10 gene, and subtype III-I, which has an extremely diverged Cas10 derivative. Nevertheless, comparison of loci organization and phylogenetic analysis of Cas7 clearly shows that both III-E and III-I are derived from different variants of subtype III-D²⁸ (Extended Data Figs. 2a and 6).

**Fig. 3: The built-in signalling pathway of type III CRISPR–Cas systems.**

The diversity of type III CRISPR–Cas systems is not adequately covered by the classification into subtypes and variants, requiring a more nuanced approach. A central feature of type III is the built-in signalling pathway in which, in response to target recognition, the polymerase/cyclase domain of Cas10 synthesizes either cOA or S-adenosyl methionine (SAM)–AMP, second messengers that are bound by the sensor domain of an ancillary protein. Binding of the messenger molecule induces a conformation change in the ancillary protein, resulting in activation of its effector domain (most often an HEPN RNase but in some variants distinct nucleases or other effectors), which causes growth arrest²⁹ (Fig. 3b). In addition, these signal transduction pathways are attenuated either by ring nucleases cleaving cOA^30,31 or by enzymes responsible for cleavage of SAM–ATP³² (Fig. 3b). Sensors, effectors and ring nucleases in these pathways come in multiple forms resulting in combinatorial diversity²⁶. Although the polymerases/cyclase domain of Cas10 is occasionally inactivated (subtypes III-C, III-G and III-H) or lost (subtype III-I), about 97% of the Cas10s are predicted to be active and capable of producing messengers (Fig. 3a, Extended Data Fig. 2 and data file 3 in ref. ¹⁵).

The cOA sensors in type III CRISPR–Cas systems are almost exclusively CARF and related SAVED domains^15,26,33. Whereas many ring nucleases also possess CARF domains, several ring nucleases with enzymatic domains unrelated to CARF have also been identified³¹. Classification of CARF domains by sequence similarity led to the identification of 11 families, seven of which mostly consist of proteins containing both sensor (CARF) and effector (mostly, HEPN RNase but also other enzymes) domains, three include known or predicted ring nucleases and one (Csm6) consists of proteins apparently performing all three functions^31,33 (Fig. 3b, Supplementary Table 2 and data file 4 in ref. ¹⁵). Here we introduce a systematic nomenclature for the major groups of the sensor and ring nuclease CARF domain-containing proteins (Crf1–11 families; Crf is an abbreviation of CARF that we introduce to differentiate it from the generic domain name) to facilitate systematic analysis and description of type III CRISPR–Cas systems and the relationships among them. To ensure compatibility with the published work, legacy names are also indicated if they exist (Supplementary Table 2). All experimentally characterized families of ring nucleases are designated as Crn1–Crn5, with legacy names also retained (Supplementary Table 2).

The most common cOA-activated effectors are HEPN RNases but a variety of other unrelated effectors have been identified, including RNases of the PIN and RelE families, DNases of the PD-(D/E)xK and HD families, proteases of the Caspase and Lon families, several other enzymes as well as transcriptional regulators and integral membrane proteins²⁶ (Fig. 3b). Whereas the cOA pathway is well-established, components of the SAM–AMP pathway have not been thoroughly characterized. A membrane protein homologous to the magnesium transporter CorA that was identified in some III-B and III-D loci and shown to sense SAM–AMP via a dedicated sensor domain seems to be a signature of this pathway³². However, it remains unclear if the CorA homologue in III-D systems senses SAM–AMP or another signalling molecule, especially given that, in contrast to the III-B loci, known SAM–AMP cleavage enzymes are not encoded in the III-D loci (Fig. 3a,b).

Another notable clade in the Cas10 tree is the III-A2 variant, which lacks an identifiable Cas11 (Figs. 2e and 3a). In this variant, Cas10 is predicted to synthesize cOA, which in some of the systems probably directly activates the NucC nuclease effector³⁴. The conservation of the catalytic aspartate in Cas7 suggests that this system might target both RNA, via Cas7, and DNA, via NucC (data file 3 in ref. ¹⁵).

Both sensors and effectors are scattered across the clades of the Cas10 tree, which is suggestive of extensive module shuffling and horizontal gene transfer shaping of the cOA and SAM–AMP signalling pathways in type III systems (Fig. 3a and data file 3 in ref. ¹⁵). Despite the extensive shuffling of signalling pathway components, some trends are notable^33,35. First, ring nucleases are an important component of the signalling pathway and are present in most of the loci encoding enzymatically active Cas10 capable of cOA synthesis (Fig. 3a and data file 3 in ref. ¹⁵). Second, SAM–AMP is cleaved by a distinct set of enzymes typically encoded in the respective loci (Fig. 3a and Supplementary Table 2). Investigation of exceptions from these trends might lead to the discovery of Cas10s synthesizing different messengers as well as corresponding novel sensors and nucleases. A proof of principle is the identification of the Crf1_Csm6 family of proteins with a dual function of sensor and ring nuclease associated with III-A loci in which no other ring nuclease was detected³⁶. Other cases such as the aforementioned CorA-associated III-D systems require further study.

Expansion of class 2 CRISPR–Cas systems

The discovery of numerous class 2 CRISPR–Cas systems, primarily subtypes of type V, was the hallmark of the 2020 update of the CRISPR–Cas classification⁸. The distinguishing feature of these class 2 CRISPR–Cas systems is that their effector machinery consists of a single multidomain protein, namely Cas9 in type II, Cas12 in type V and Cas13 in type VI (Fig. 1 and Extended Data Figs. 3 and 4). Class 2 CRISPR–Cas systems are more simply organized than those of class 1, so classification relies primarily on comparison and phylogenetic analysis of the single large effector signature proteins. The continued discovery of distinct variants in the last few years led to the delineation of subtype II-D. This subtype includes the former II-C2 variant characterized by small archaeal Cas9s⁸, designated II-D in CasPEDIA¹³, together with the recently described variant encompassing the smallest known possibly ancestral Cas9s and also designated II-D³⁷. Despite variation in size, phylogenetic analysis shows that Cas9d proteins confidently group together and form a deep branch in the Cas9 tree (Fig. 4a and data file 4 in ref. ¹⁵). Furthermore, the respective loci share the adaptation module containing cas1, cas2 and cas4 genes, with all Cas1 proteins forming a monophyletic branch in the Cas1 tree, supporting the robust classification of these loci as a single new subtype (Fig. 4a,b, Extended Data Fig. 3a and data file 4 in ref. ¹⁵). Notably, II-D systems are common among symbiotic and parasitic nanoarchaea of the Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanoarchaeota and Nanohaloarchaeota (DPANN) superphylum, along with some bacteria, whereas previously class 2 systems were considered a bacterial staple absent from the archaea. It has been shown that, like all type II CRISPR–Cas systems, type II-D loci encode trans-activating CRISPR RNA (tracrRNA)³⁷. Given that, like all other archaea, DPANN members encoding type II-D CRISPR–Cas systems lack RNase III, the pre-crRNA in this case is likely to be processed either by an alternative RNase or by Cas9d itself. A rare, but notable, variant II-C2 encoded in phage genomes encompasses a Cas9 protein with an inactivated RuvC nuclease domain but apparently active HNH nuclease, which might function as an anti-CRISPR protein⁹ (Fig. 4b).

**Fig. 4: New class 2 CRISPR–Cas systems.**

As in the 2020 CRISPR–Cas classification⁸, the greatest proliferation of subtypes and variants was observed within type V. Comprehensive phylogenomic analysis of transposon-encoded RNA-guided TnpB nucleases, the evolutionary ancestors of Cas12, showed that emergence of new type V CRISPR–Cas variants through the association of TnpB with CRISPR arrays and/or adaptation modules is not a rare event in the evolution of bacteria and archaea³⁸. About 50 such independent events have been inferred and, beyond doubt, many more remain to be discovered in the ever-expanding genomic and metagenomic databases (Extended Data Fig. 4b and data files 5 and 6 in ref. ¹⁵). It is not always easy to decide which of these loci qualify as subtypes and which should remain variants. Variants can be upgraded to subtypes as they are studied experimentally and/or expanded through continued genome and metagenome exploration. In the last few years, six previously uncharacterized type V subtypes and 12 variants have been studied experimentally^9,39 (Fig. 4c–e and Extended Data Fig. 4a,b). At least five additional variants seem to be strong subtype candidates based on phylogenomic analysis (Extended Data Fig. 4b). One notable trend in type V evolution is the flexibility of target and collateral substrate recognition, including nucleases that can target and cleave both single-stranded (ss) RNA and ssDNA, such as Cas12g⁴⁰, or are involved in both RNA targeting and collateral cleavage of ssRNA, ssDNA and dsDNA, such as Cas12a2 (ref. ⁴¹). Another trend is the inactivation of the RuvC-like nuclease domain of Cas12 that occurred independently on multiple occasions. At least some of these inactivated type V subtypes and variants nevertheless retain the interference capacity via mechanisms that do not include target DNA cleavage, such as blocking transcription via crRNA-guided binding of Cas12 to the target as demonstrated for subtypes V-C⁴² and V-M^43,44, and confidently predicted for V-O and V-P, and possibly Cas12b3 (Extended Data Fig. 7). Other inactivated type V subtypes have been repurposed by transposons for RNA-guided transposition (discussed later; Extended Data Fig. 4b).

A signature feature of type II CRISPR–Cas systems is the involvement of tracrRNA in crRNA maturation and interference⁴⁵. Among the type V systems, some also use tracrRNA (and are thus predicted to employ RNase III for pre-crRNA processing), whereas pre-crRNA processing is catalysed by a distinct active site in Cas12 itself in at least half of the characterized subtypes⁴⁰. Thus, tracrRNA apparently evolved independently on many occasions in type II and type V CRISPR–Cas systems⁴⁶.

Two previously unrecognized subtypes of type VI, namely VI-E and VI-F, have been established. Both Cas13e and Cas13f are deep branches in the Cas13 dendrogram (Fig. 4e,f and Extended Data Fig. 3b–d). Cas13e is by far the smallest, most compact Cas13 protein and a potential early intermediate in type VI evolution⁴⁷. Subtype VI-F, which was mentioned as an unclassified variant in the 2020 CRISPR–Cas classification is so far limited to a single bacterial genus, Brachispira. Structure analysis showed that Cas13b proteins share features distinct from those of other type VI effectors, suggesting the possibility of two independent origins of Cas13 from HEPN-containing toxins⁴⁷. The group of subtype VI-B systems with the smallest known effectors are referred to as variant VI-B3 (Extended Data Fig. 3d).

Beyond the main effector nucleases, many class 2 CRISPR–Cas systems encompass a growing set of accessory proteins that augment the immune response, including some reported or characterized since the publication of the 2020 CRISPR–Cas classification. Subtype VI-B systems show a particularly notable diversity in both Cas13b size and the presence of associated genes^47,48. Csx27 and Csx28 were previously identified as accessory proteins and shown to augment the immune function of Cas13b⁴⁹. It has subsequently been shown that, following Cas13b activation, Csx28 forms an octameric pore that depolarizes the inner membrane⁵⁰. Some of the other accessory proteins in class 2 CRISPR–Cas systems include the predicted PIN-domain nuclease PcrIIC1 that is encoded in some II-C loci and enhances the efficiency and promiscuity of DNA targeting^51,52 and an RNase H fold nuclease of the DUF3800 family encoded in some VI-F and V-A2 loci (Fig. 4g). More accessory proteins associated with class 2 systems and enhancing their immune function via different mechanisms probably await discovery.

Distribution of CRISPR–Cas systems in bacteria and archaea

As generally expected of antivirus defence systems⁵³, the different types and subtypes of CRISPR–Cas systems are highly non-uniformly distributed among bacterial and archaeal lineages at all levels, from phyla to isolates within species. We surveyed CRISPR–cas loci in the current collection of complete bacterial and archaeal genomes. Among the 47,545 analysed genomes, complete CRISPR–Cas systems were identified in a majority of archaea (410 of 683 genomes, prevalence of 52% after adjustment for sampling bias; Methods) but in less than half of bacteria (16,488 of 46,862 genomes, 41%; Fig. 5a and data file 7 in ref. ¹⁵). As observed earlier⁸, an overwhelming majority of both bacterial and archaeal thermophiles tend to possess complete CRISPR–Cas systems (weighted prevalence of 84 and 90% for thermophiles and hyperthermophiles, respectively), whereas they are substantially less prevalent in mesophiles and psychrophiles (46 and 38%, respectively). In this census, the fraction of archaeal genomes containing CRISPR–cas loci was much lower than that in the previous analyses. Clearly, this drop in CRISPR–Cas prevalence is due to the recently increased genome sequencing and broader sampling of mesophilic archaea, which shows that high prevalence of CRISPR–Cas is a signature of thermophiles rather than archaea as such. Among archaeal phyla, Bathyarchaeota (marine sediment mesophiles) stand out for the hitherto complete lack of detectable CRISPR–cas loci despite the availability of 23 diverse, completely sequenced genomes. Archaea (specifically, Ferroglobus placidus) harbour the only type VII system among the completely sequenced genomes. In agreement with earlier observations, the archaeal CRISPR–Cas repertoire is dominated by class 1 systems (Fig. 5b), although a few class 2 systems are scattered across Methanobacteriota and Thermoproteota, and (as pointed out earlier) subtype II-D is widely represented in the DPANN superphylum. In most archaea, type I systems comprise the majority, except in Thermoproteota where type III accounts for 51% of the CRISPR–cas loci. The most abundant subtype across archaeal phyla is subtype I-B, with two exceptions: subtype I-A in Thermoproteota and subtype III-A in Thermoplasmatota. The high prevalence of subtypes III-B and III-D in Thermoproteota and Nitrososphaerota, respectively, is also notable.

**Fig. 5: Distribution of CRISPR–Cas systems across bacteria and archaea.**

Among the well-sampled bacterial phyla, only Gemmatimonadota completely lack detectable CRISPR–cas loci; in Chlamydiota, only one genome (‘Candidatus Protochlamydia naegleriophila’) of the 437 analysed harbours a single type I-E locus (Fig. 5b). Class 1 is also substantially more abundant than class 2 in bacteria (Fig. 5b), with a few exceptions. These include Mycoplasmatota, Bacteroidota and Bdellovibrionota, where type II systems dominate the landscape. Type V and VI systems are much less abundant across the diversity of prokaryotes than types I–IV, although type V systems are comparatively prominent in Spirochaetota, Cyanobacteriota, Planctomycetota and Verrucomicrobiota (8–13% of all CRISPR–cas loci), whereas type VI systems are common in Fusobacteriota and Bacteroidota (9 and 5%, respectively).

Compared with the previous census⁸, the new types and subtypes identified in the intervening five years comprise only 0.3% of the CRISPR–Cas repertoire in the completely sequenced prokaryotic genomes (Fig. 5c). New subtypes of type V and type III account for most of this expansion. A qualitatively similar picture emerges from the survey of the clustered NR database (Fig. 5d). Thus, whereas all common types and subtypes of CRISPR–Cas systems seem to be already known, the discovery of new, increasingly rare, variants will probably continue in the coming years, extending the tail of the distribution.

Many CRISPR–Cas systems are currently found only in incomplete genomes or metagenomes. Not surprisingly, most of these belong to the rare and diverse type V subtypes V-C, E, G, H, I, J, L, N, O and P, and most of the new variants. Furthermore, only one representative of type VII and subtype III-I each and none of subtype III-H are present among complete genomes, whereas dozens of these loci are identifiable in the NCBI NR database. Nevertheless, the recent massive effort on prediction of novel CRISPR–Cas systems in a large metagenomic dataset revealed only a few novel systems at the type and subtype levels⁹. It should be noted that a variety of unusual class 1 variants, such as those associated with MGE and those containing HNH nucleases fused to different Cas proteins, are difficult to identify, requiring dedicated computational approaches and manual curation to distinguish them from the mainstream CRISPR–Cas systems.

Discussion

The updated CRISPR–Cas classification presented here consists of two classes, seven types and 46 subtypes. Because the types and subtypes of CRISPR–Cas systems that are widespread in bacteria, archaea and their MGEs seem to be already known, the current classification is likely to remain stable in its main features. By contrast, the tail of the CRISPR–Cas diversity distribution is very long and many subtypes and variants, and possibly even new types, remain to be discovered. However, these yet unknown varieties of CRISPR–Cas systems are bound to be increasingly rare and the discovery of previously unknown variants requires mining of enormous amounts of genome and metagenome sequences using dedicated computational pipelines.

The discovery and comparative analysis of previously unknown CRISPR–Cas systems have led to substantial insights into CRISPR–Cas origins and evolution. Some overarching trends are becoming apparent: (1) interplay between structural and functional complexification, and reductive, streamlining evolution of CRISPR–Cas systems; (2) tight evolutionary entanglement between CRISPR–Cas and various types of MGE^54,55, as per the ‘guns for hire’ concept⁵⁶; (3) repeated exaptation of CRISPR–Cas systems for other non-defence functions⁵⁷; (4) in situ shuffling of effector modules and (5) acquisition of ancillary genes (Fig. 6 and Extended Data Figs. 8–10; details in Supplementary Discussion). Beyond these trends, the evolutionary trajectories of class 1 and class 2 seem to be quite different and so are those of the adaptation and effector modules of the CRISPR–Cas systems.

**Fig. 6: Main trajectories and processes in the evolution of CRISPR–Cas systems.**

In the near future artificial intelligence approaches will undoubtedly play an increasingly prominent role in the discovery of previously uncharacterized CRISPR–Cas variants with predefined features⁵⁸. Such new discoveries could also come from targeted exploration of specific environments, in particular, extreme environments. Apart from new CRISPR–Cas varieties, a vast field for future studies is the widespread CRISPR–Cas exaptation, in particular, by various MGEs. Functional characterization of such repurposed CRISPR–Cas systems should help clarify fundamental aspects of co-evolution between prokaryotes and MGEs.

Methods

The prokaryotic genome database

A collection of 46,862 bacterial and 683 archaeal genomes available from GenBank and RefSeq that were completely sequenced or assembled at the chromosome level was downloaded in November 2023. All 75 × 10⁶ protein sequences with unique identifiers were clustered using MMSEQS2 (ref. ⁵⁹) at two levels: (1) with similarity and coverage thresholds of 0.9 and (2) with a similarity threshold of 0.5 and a coverage threshold of 0.33 (–cluster-mode 2 (greedy incremental clustering) and –cov-mode 1 (coverage calculated on the target sequence)). The optimal growth temperature was obtained for 24,711 genomes by reconciling the data downloaded from https://melnikovlab.com/gshc/ (on 3 February 2025) with genome assembly identifiers and species-level taxonomy identifiers in our dataset.

CRISPR array detection

A total of 320,239 CRISPR arrays and CRISPR-like repetitive sequences were identified within the prokaryotic genome database using the standalone version of CRISPRCasFinder 4.2.20 (ref. ⁶⁰) with default parameters, except for minimum direct repeat length of 20 base pairs.

Protein annotation

Proteins were annotated using a PSI-BLAST⁶¹ search against the database of all protein sequences encoded by the prokaryotic genomes in the dataset described above. Queries for these search processes included public NCBI Conserved Domain Database profiles⁶² (Pfam, NCBI conserved domains (CD) and clusters of orthologous genes (COGs)), excluding public profiles for Cas proteins and including new Cas protein profiles developed in this study. A non-overlapping set of best-scoring profile hits with a PSI-BLAST e-value threshold of ≤0.0001 was assigned to each protein sequence.

Phylogenetic analysis

Genome tree

For approximate assessment of the phylogenetic relationships between the genomes, protein sequences from the 54 nearly universal bacterial COGs and 55 nearly universal archaeal COGs were aligned using FAMSA⁶³. For COGs represented by two or more paralogues, a single representative per genome with the highest similarity to the alignment consensus was selected. After removing low homogeneity and highly gapped columns⁶⁴, alignments of bacterial and archaeal proteins were concatenated separately and the corresponding approximate maximum likelihood trees were constructed using FastTree with gamma-distributed site rates and Whelan and Goldman evolutionary model⁶⁵.

Phylogenetic trees

For the Cas9 protein family, the following procedure was applied to facilitate the multiple sequence alignment (MSA) construction: (1) clustering using MMSEQS2 to obtain a non-redundant dataset (similarity and coverage thresholds of 0.9), (2) alignment of cluster representatives using Muscle5 (ref. ²), (3) poorly aligned sequences or fragments were discarded, (4) filtering for alignment columns with a homogeneity value of ≥0.05 and gap fraction of <0.667, and (5) the filtered alignment was used as the input for FastTree⁶⁵ to construct maximum likelihood phylogenetic tree with the Whelan and Goldman evolutionary model, gamma-distributed site rates; the same program was used to calculate support values (data file 4 in ref. ¹⁵).

To build trees for Cas1, Cas3 and the RuvC domains of Cas12/TnpB family, a modification of the above procedure was used as follows: (1) clustering using MMSEQS2 with a similarity threshold of 0.5 and a coverage threshold of 0.33, (2) alignment of sequences inside each cluster using MUSCLE5, (3) extraction of the consensus sequence from each alignment, (4) aligning the cluster consensus sequences, (5) expansion of this alignment into pseudo-MSA⁶⁶, and (6) filtering alignment and building FastTree tree as described in the previous paragraph (data file 4 in ref. ¹⁵).

Unweighted pair group method with arithmetic mean dendrograms

To investigate the hierarchy of similarity between protein clusters that do not readily align, relative HHSEARCH⁶⁷ scores (\({S}_{\rm{A,B}}/\min ({S}_{\rm{A,A}},{S}_{\rm{B,B}})\), where \({S}_{\rm{A,B}}\) is the score between profiles A and B) were obtained for cluster comparison and converted to distances using the negative log transformation. Hierarchical unweighted pair group method with arithmetic mean (UPGMA) dendrograms were then constructed using scipy.cluster.hierarchy and the ‘average’ method in Python or hclust and the ‘average’ method in R. This approach was applied for Cas8/Cas10s (Extended Data Fig. 10), Cas12 (Extended Data Fig. 4b), Cas13 (Extended Data Fig. 3c), Cas7, Cas5 and Cas11 (data file 4 in ref. ¹⁵). The specific settings are indicated in the respective figure legends.

Hybrid dendrograms

Hybrid dendrograms were built as described previously⁸. Briefly, the FastTree program⁶⁵ (Whelan and Goldman evolutionary model, gamma-distributed site rates) was used to infer relationships within alignable clusters and the relationships between these clusters were inferred from HHalign pairwise scores using the matrix-based UPGMA approach as described in the previous section. The FastTree trees built for clusters were grafted onto the tips of the profile similarity-based UPGMA dendrogram. Such hybrid dendrogram were built for the Cas10 and CARF families (Fig. 3a and data file 4 in ref. ¹⁵).

Updating Cas protein profiles

Using NCBI Conserved Domain Database Cas protein profiles (Pfam, CD and COG)⁶² and the previously published Cas protein profile sets⁸, Cas proteins were retrieved from the prokaryotic genome database. Preliminary CRISPR–cas genomic islands were then assembled (ten genes flanking the genes cas1–cas14 from each side) for tentatively identified type I, II, III, IV and VII systems (the type V and VI systems were analysed separately). Unknown genes adjacent to known cas genes or CRISPR arrays were manually analysed using tools for detection of sequence and structural similarity, including HHPred for sequences⁶⁷ and AF2 modelling⁶⁸, followed with DALI⁶⁹ for structures. Using the retrieved Cas protein sequences and sequences from the previously published CRISPR–Cas profiles⁸, sets of protein sequences were assembled for the following major protein families: Cas1, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9, Cas10, Cas10d, Cas11, Cas14, Csx19, Csx22, Csx24, Csx25, Csx26 and CARF. Cas12 and Cas13 proteins were analysed separately as described below.

Profiles were constructed for each protein set using the following pipeline. Representatives of each protein family were selected from the 0.9 similarity clusters of the prokaryotic protein dataset (the first member of each cluster, as reported by MMSEQS2). PSI-BLAST footprints from cognate CRISPR–Cas profiles were retrieved from this set of cluster representative sequences. For manually annotated CRISPR-linked genes (unknown genes from the CRISPR–Cas islands described earlier), the entire sequence was taken as a footprint. The footprints were further clustered with a similarity threshold of 0.7 and coverage threshold of 0.9 using MMSEQ2. Profiles were generated for each cluster of the footprints according to the following procedure. The sequences within each cluster were aligned using FAMSA for clusters including >500 sequences, muscle5 with the ‘super5’ option for clusters including 101–500 sequences and muscle5 with the ‘mpc’ option for clusters including ≤100 sequences⁷⁰. To filter fragments or partial sequences, the resulting alignment was used as a PSI-BLAST query against all cluster sequences and sequences matching <75% of the profile length were removed. The cleaned sequence set was realigned as described earlier.

Because the legacy Cas protein profiles (Pfam, CD, COG and previously constructed custom profiles^8,62) often include inconsistently defined boundaries for the same domain, their footprints could differ strongly with respect to mapping on the protein structure. To mitigate this and obtain a consistent set of homologous sequences, the mapping and realigning procedure described above was reapplied using the cluster profile as the query for PSI-BLAST.

Each resulting 0.7 similarity cluster was used as a PSI-BLAST query against the prokaryotic database to identify hits (e-value cutoff of 0.0001) outside CRISPR–cas islands, which were treated as false positives for further calculations. A reduced set of clusters representing closely related protein groups was constructed using the 0.7 similarity cluster profiles and the hierarchical clustering procedure described earlier with a tree depth cutoff of 0.8. Profiles were generated for the resulting clusters and these clusters were used to generate a distance matrix as described earlier. Further clustering was performed using the neighbour-joining method⁷¹ until convergence. In each neighbour-joining iteration, sequences from the closest cluster profile pair were merged, a new profile was generated with the approach described above and the distance matrix was recalculated. Clusters were merged only if the PSI-BLAST results for the merged profile contained more than 98% of the original sequences and no more than ten new false positives (except Cas3 and CARF genes, where false-positive counts were not used for the neighbour-joining profile generation procedure given the abundance of homologues outside CRISPR–Cas systems).

The resulting clusters were manually reviewed. Clusters containing ≤3 sequences were removed if those sequences were represented in the PSI-BLAST results (e-value cutoff of 0.0001) of larger clusters. In other cases, and for CRISPR–Cas genes that are poorly represented (<3 copies) in the prokaryotic database, profiles were constructed using sequences from the clustered NR. Sequences of the respective proteins were used as PSI-BLAST queries for three iterations against the clustered NR database https://ncbiinsights.ncbi.nlm.nih.gov/2022/05/02/clusterednr_1/ to retrieve additional sequences and align them as described earlier.

Cas12 profiles were derived from distinct branches of phylogenetic trees and UPGMA dendrograms reported previously^38,39. The Cas12b3 alignment was built using the sequences from NR. For all Cas12 families, both near-full-size alignments³⁸ and alignments of the RuvC-like nuclease domains alone were used to construct profiles.

Profiles for Cas13a, -c and -d were derived from previously published alignments⁸; Cas13b alignments and Cas13f were constructed in this Analysis from recently reported sequences^47,48,72,73. For all Cas13 families, separate profiles were constructed for the N-terminal and C-terminal HEPN domains.

Estimation of recombination within CRISPR–cas loci

A rough estimate of the extent of recombination between modules within CRISPR–cas loci was obtained by tallying the co-occurrence of cas1 with distinct effector genes (cas8, cas9, cas10 or cas12) within the same locus. If Cas1 proteins from the same strict cluster (90% identity) co-occurred with effectors from permissive clusters (50% identity), a recombination event between the adaptation module and the effector module was registered, producing a conservative estimate of inter-module swaps.

Estimation of abundance of CRISPR–Cas systems

Profiles of large subunits of effector complexes of Class 1 systems and effector proteins of Class 2 systems were searched against the NCBI clustered NR database using MMSEQS2 (ref. ⁵⁹). For most of the CRISPR effectors, the corresponding profiles were used as queries in a maximum sensitivity (−s 7.5) MMSEQS2 search with an e-value threshold of 0.0001. For type V effectors, sequences in Cas12 profiles were clustered using MMSEQS2 with similarity thresholds of 0.9, and the cluster representatives were used as queries in a maximum sensitivity (−s 7.5) MMSEQS2 search with an e-value threshold of 0.0001 and similarity threshold of 0.6 (as reported by MMSEQS2). The best non-overlapping hits that covered at least 0.75 of the query length (either profile consensus or the individual query protein) were recorded for each target protein. The number of hits was then assigned to each type and subtype, corresponding to the query provenance, and used to approximate the relative abundance of CRISPR–Cas types and subtypes in the microbial world beyond the completely sequenced genomes.

Assembly of CRISPR–cas genomic loci

For the genes encoding all Cas proteins and cyclic oligonucleotide-based anti-phage signalling systems, genomic neighbourhoods of up ten flanking genes from each side, including predicted CRISPR arrays, were defined. Neighbourhoods containing at least one core Cas proteins (cas1–cas14) were retrieved. Cas12 proteins were identified and classified separately using Cas12/TnpB phylogenetic tree (data file 4 in ref. ¹⁵). Such analysis is required to assign subtypes and to distinguish CRISPR effectors from similar TnpB-like RNA-guided nucleases and candidate systems (collectively referred to as subtype V-U)^8,38,39. Cas13 proteins were identified using the respective profiles and manually curated to remove false positives. Annotation of protein-coding genes with type- and subtype-specific profiles was used to classify CRISPR–Cas systems (data files 5, 6 and 8 in ref. ¹⁵). Boundaries of CRISPR–cas loci were defined to include all CRISPR and Cas10-associated cyclic oligonucleotide-based anti-phage signalling systems genes and members of their respective directions at the locus margins.

For CRISPR–Cas systems poorly represented in the complete genome database used here (subtypes II-D, III-H, III-I, V-L, V-N, V-O, V-P, V-Q, VI-E, VI-F and VII), genome neighbourhoods were retrieved from the NCBI nucleotide sequence database. The anchors for these loci were obtained by searching the corresponding effector protein profiles against the NCBI clustered NR database (Cas10 for III-H, and Cas7-like and Cas10-like for III-I).

Genome weights

To mitigate the massively uneven representation of prokaryotic clades in the complete genome database, a tree-based genome weighting scheme was used⁷⁴. The analysis of the representation of CRISPR–Cas systems was based on weights obtained for all 47,545 complete genomes. The representation of CRISPR–Cas systems with respect to temperature preferences was analysed using the weights for the subset of 24,711 genomes with known optimal growth temperatures. For the estimation of the relative abundances of CRISPR–Cas systems, weights were derived from the subtree of the 16,898 genomes encoding at least one complete CRISPR–Cas system.

Structural modelling and comparison

Modelling of individual Cas proteins and extraction of core domains

Structural models of representative Cas7, Cas8 and Cas10 proteins of different type I and III subtypes were predicted using AlphaFold2 (default parameters)⁶⁸. Core domains of Cas8 and Cas10 (both AlphaFold2 predictions and from PDB) have been extracted by identifying fused Cas3 and Cas11 by structure comparison. To do so, full Cas8 and Cas10 structures have been compared against individual Cas3 and Cas11 structures available in PDB by foldseek. After mapping, fused Cas3, Cas5 and Cas11 domains were removed, if applicable, and the core Cas8 and Cas10 proteins kept. Core structures were verified manually afterwards and used for Cas8 and Cas10 comparison (data file 1 in ref. ¹⁵).

Structure comparison of Cas8 and Cas10 proteins

Core Cas8 and Cas10 domains from selected representatives of type I and III subtypes were extracted from either PDB structures or AlphaFold2 (ref. ⁶⁸) models and compared all-versus-all using DALI⁶⁹ (data file 4 in ref. ¹⁵). Pairwise root mean square deviation values were used to construct an UPGMA dendrogram as follows: (1) the pairwise root mean square deviation values were normalized to the minimum of the self-scores and converted to a distance matrix on the natural logarithmic scale and (2) the UPGMA dendrogram was reconstructed from this distance matrix using the R package hclust with the argument ‘method’ set to average (=UPGMA).

Structure comparison of Cas7 domains

Individual Cas7 domains were extracted from III-I Cas7-11i (WP_207677910), III-D Cas7 (8bmw), III-E Cas7-11e (7zol) and AlphaFold2 predicted III-D2 Cas7-11c (GwCas7-11c) and compared all-versus-all by DALI. Pairwise root mean square deviation values were used to construct a UPGMA dendrogram using the R package hclust with the argument ‘method’ set to average (=UPGMA).

Modelling of CRISPR effector complexes

CRISPR effector complexes of subtypes III-G, III-H and III-I have been predicted with AlphaFold3⁷⁵ (default settings) using the following input sequences: III-G (NC_012623.1), S. islandicus Y.N.15.51, 7×Cas7_1, 1×Cas7_2, 1×Cas5, 1×Cas10, crRNA (48 nt) and crRNA (38 nt); III-H (RPGO01000026), ‘Candidatus A. ethanivorans’ isolate Eth-Arch1, AEth_01085, 7×Cas7, 1×Cas5 and 1×Cas7, RNA as for III-G; III-I, (D. magnum WP_207677910 and WP_207677911.1), 1×Cas7-11i and 1×Cas10; and RNA from type III-E of D. magnum (PDB 7zol). Predicted complex structures are available in data file 1 in ref. ¹⁵.

Criteria for identification and classification of CRISPR–cas systems

As in previous classifications of CRISPR–Cas systems, the main approach for their identification was the search of protein sequences encoded in bacterial and archaeal genomes using position-specific iterated BLAST (PSI-BLAST)⁶¹. The queries for these searches were position-specific scoring matrices (PSSMs) generated from multiple alignments of the 14 core Cas proteins and additional ancillary proteins strongly associated with CRISPR–Cas systems. The use of PSSMs rather than individual protein sequences as queries is essential for reliable identification of CRISPR–Cas systems due to the typically high evolution rate of cas genes⁷⁶. Together, for this work, we assembled a set of 915 PSSMs covering cas and other associated genes including 293 newly constructed and curated ones that were used to identify 146,993 core cas genes (cas1–cas14) in 47,545 completely sequenced bacterial and archaeal genomes that were available at the NCBI in November 2023 (Fig. 1a, ‘The prokaryotic genome database’ section and data files 5, 6 and 8 in ref. ¹⁵). For most of the core Cas protein families, multiple PSSMs had to be used because high sequence divergence did not allow reliable recognition with a single PSSM. For all core genes, similarity dendrograms or phylogenetic trees were constructed to assess the relationships among the Cas proteins in the respective families (data file 4 in ref. ¹⁵). In addition, representatives of several subtypes and variants of type V systems were missing in the analysed genome collection, so the sequences were extracted from recent publications and used as queries to search the GenBank (‘Protein annotation’ section). After extensive manual curation of the results, this search yielded 21,474 complete (that is, those containing all core as well as subtype-specific signature genes of the effector module) CRISPR–Cas systems that were assigned to types and subtypes (Extended Data Figs. 1–4 and data file 8 in ref. ¹⁵). The loci that could not be confidently assigned to any of the previously identified subtypes were designated representatives of new subtypes or variants. New variants identified in recent genome and metagenome analyses but missing in the complete genome collection were added to the list of CRISPR–cas loci (Extended Data Figs. 1–4 and data file 8 in ref. ¹⁵). The substantially updated collection of multiple alignments and PSSMs for Cas protein families that was developed and employed in this work is available in data files 5 and 6 in ref. ¹⁵. These alignments and/or PSSMs can be used as tools for identification and classification of CRISPR–cas loci in sequenced genomes and metagenomes. Several conflicts with the published classification of CRISPR–Cas systems are listed in Supplementary Table 3.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

All the data used and generated in this work are available at https://doi.org/10.5281/zenodo.17388109 (ref. ¹⁵).

References

Komor, A. C., Badran, A. H. & Liu, D. R. CRISPR-based technologies for the manipulation of eukaryotic genomes. Cell 168, 20–36 (2017).
Article CAS PubMed Google Scholar
Shivram, H., Cress, B. F., Knott, G. J. & Doudna, J. A. Controlling and enhancing CRISPR systems. Nat. Chem. Biol. 17, 10–19 (2021).
Article CAS PubMed Google Scholar
Mohanraju, P. et al. Diverse evolutionary roots and mechanistic variations of the CRISPR–Cas systems. Science 353, aad5147 (2016).
Article PubMed Google Scholar
Hille, F. et al. The biology of CRISPR–Cas: backward and forward. Cell 172, 1239–1259 (2018).
Article CAS PubMed Google Scholar
McGinn, J. & Marraffini, L. A. Molecular mechanisms of CRISPR–Cas spacer acquisition. Nat. Rev. Microbiol. 17, 7–12 (2019).
Article CAS PubMed Google Scholar
Nussenzweig, P. M. & Marraffini, L. A. Molecular mechanisms of CRISPR–Cas immunity in bacteria. Annu Rev. Genet. 54, 93–120 (2020).
Article CAS PubMed Google Scholar
Wang, J. Y., Pausch, P. & Doudna, J. A. Structural biology of CRISPR–Cas immunity and genome editing enzymes. Nat. Rev. Microbiol. 20, 641–656 (2022).
Article CAS PubMed Google Scholar
Makarova, K. S. et al. Evolutionary classification of CRISPR–Cas systems: a burst of class 2 and derived variants. Nat. Rev. Microbiol. 18, 67–83 (2020).
Article CAS PubMed Google Scholar
Altae-Tran, H. et al. Uncovering the functional diversity of rare CRISPR–Cas systems with deep terascale clustering. Science 382, eadi1910 (2023).
Article CAS PubMed PubMed Central Google Scholar
Barrangou, R. et al. CRISPR provides acquired resistance against viruses in prokaryotes. Science 315, 1709–1712 (2007).
Article CAS PubMed Google Scholar
Makarova, K. S. et al. Evolution and classification of the CRISPR–Cas systems. Nat. Rev. Microbiol. 9, 467–477 (2011).
Article CAS PubMed Google Scholar
Makarova, K. S. et al. An updated evolutionary classification of CRISPR–Cas systems. Nat. Rev. Microbiol. 13, 722–736 (2015).
Article CAS PubMed PubMed Central Google Scholar
Adler, B. A. et al. CasPEDIA database: a functional classification system for class 2 CRISPR–Cas enzymes. Nucleic Acids Res. 52, D590–D596 (2024).
Article CAS PubMed Google Scholar
Yang, J. et al. Structural basis for the activity of the type VII CRISPR–Cas system. Nature 633, 465–472 (2024).
Article CAS PubMed Google Scholar
Makarova, K. S. et al. An updated evolutionary classification of CRISPR–Cas systems including rare variants. Zenodo https://doi.org/10.5281/zenodo.17388109 (2025).
Zhang, C. et al. Mechanisms for HNH-mediated target DNA cleavage in type I CRISPR–Cas systems. Mol. Cell 84, 3141–3153 (2024).
Article CAS PubMed Google Scholar
Hirano, S., Altae-Tran, H., Kannan, S., Macrae, R. K. & Zhang, F. Structural determinants of DNA cleavage by a CRISPR HNH-cascade system. Mol. Cell 84, 3154–3162 (2024).
Article CAS PubMed PubMed Central Google Scholar
Stoddard, B. L. Homing endonuclease structure and function. Q. Rev. Biophys. 38, 49–95 (2005).
Article CAS PubMed Google Scholar
Benz, F. et al. Type IV-A3 CRISPR–Cas systems drive inter-plasmid conflicts by acquiring spacers in trans. Cell Host Microbe 32, 875–886 (2024).
Article CAS PubMed Google Scholar
Rybarski, J. R., Hu, K., Hill, A. M., Wilke, C. O. & Finkelstein, I. J. Metagenomic discovery of CRISPR-associated transposons. Proc. Natl Acad. Sci. USA 118, e2112279118 (2021).
Article CAS PubMed PubMed Central Google Scholar
Faure, G. et al. Modularity and diversity of target selectors in Tn7 transposons. Mol. Cell 83, 2122–22136 (2023).
Article CAS PubMed PubMed Central Google Scholar
Hsieh, S. C. & Peters, J. E. Discovery and characterization of novel type I-D CRISPR-guided transposons identified among diverse Tn7-like elements in cyanobacteria. Nucleic Acids Res. 51, 765–782 (2023).
Article CAS PubMed Google Scholar
Hsieh, S. C. et al. Telomeric transposons are pervasive in linear bacterial genomes. Science 387, eadp1973 (2025).
Article CAS PubMed PubMed Central Google Scholar
Pinilla-Redondo, R. et al. Type IV CRISPR–Cas systems are highly diverse and involved in competition between plasmids. Nucleic Acids Res. 48, 2000–2012 (2020).
Article CAS PubMed Google Scholar
Moya-Beltrán, A. et al. Evolution of type IV CRISPR–Cas systems: insights from CRISPR loci in integrative conjugative elements of Acidithiobacillia. CRISPR J. 4, 656–672 (2021).
Hoikkala, V., Graham, S. & White, M. F. Bioinformatic analysis of type III CRISPR systems reveals key properties and new effector families. Nucleic Acids Res. 52, 7129–7141 (2024).
Article PubMed PubMed Central Google Scholar
Wiegand, T. et al. Functional and phylogenetic diversity of Cas10 proteins. CRISPR J. 6, 152–162 (2023).
Article CAS PubMed PubMed Central Google Scholar
Ozcan, A. et al. Programmable RNA targeting with the single-protein CRISPR effector Cas7–11. Nature 597, 720–725 (2021).
Article CAS PubMed Google Scholar
Stella, G. & Marraffini, L. Type III CRISPR–Cas: beyond the Cas10 effector complex. Trends Biochem. Sci. 49, 28–37 (2024).
Article CAS PubMed Google Scholar
Athukoralage, J. S., Rouillon, C., Graham, S., Gruschow, S. & White, M. F. Ring nucleases deactivate type III CRISPR ribonucleases by degrading cyclic oligoadenylate. Nature 565, 277–280 (2018).
Athukoralage, J. S. & White, M. F. Cyclic oligoadenylate signalling and regulation by ring nucleases during type III CRISPR defence. RNA 27, 855–867 (2021).
Article CAS PubMed PubMed Central Google Scholar
Chi, H. et al. Antiviral type III CRISPR signalling via conjugation of ATP and SAM. Nature 622, 826–833 (2023).
Article CAS PubMed PubMed Central Google Scholar
Makarova, K. S. et al. Evolutionary and functional classification of the CARF domain superfamily, key sensors in prokaryotic antivirus defense. Nucleic Acids Res. 48, 8828–8847 (2020).
Article CAS PubMed PubMed Central Google Scholar
Gruschow, S., Adamson, C. S. & White, M. F. Specificity and sensitivity of an RNA targeting type III CRISPR complex coupled with a NucC endonuclease effector. Nucleic Acids Res. 49, 13122–13134 (2021).
Article CAS PubMed PubMed Central Google Scholar
Hoikkala, V., Chi, H., Gruschow, S., Graham, S. & White, M. F. Diversity and abundance of ring nucleases in type III CRISPR-cas loci. Philos. Trans. R. Soc. Lond. B 380, 20240084 (2025).
Article CAS Google Scholar
Garcia-Doval, C. et al. Activation and self-inactivation mechanisms of the cyclic oligoadenylate-dependent CRISPR ribonuclease Csm6. Nat. Commun. 11, 1596 (2020).
Article CAS PubMed PubMed Central Google Scholar
Altae-Tran, H. et al. The widespread IS200/IS605 transposon family encodes diverse programmable RNA-guided endonucleases. Science 374, 57–65 (2021).
Article CAS PubMed PubMed Central Google Scholar
Altae-Tran, H. et al. Diversity, evolution, and classification of the RNA-guided nucleases TnpB and Cas12. Proc. Natl Acad. Sci. USA 120, e2308224120 (2023).
Tordoff, J. et al. Initial characterization of twelve new subtypes and variants of type V CRISPR systems. CRISPR J. 8, 149–154 (2025).
Yan, W. X. et al. Functionally diverse type V CRISPR–Cas systems. Science 363, 88–91 (2018).
Article PubMed PubMed Central Google Scholar
Dmytrenko, O. et al. Cas12a2 elicits abortive infection through RNA-triggered destruction of dsDNA. Nature 613, 588–594 (2023).
Article CAS PubMed PubMed Central Google Scholar
Huang, C. J., Adler, B. A. & Doudna, J. A. A naturally DNase-free CRISPR–Cas12c enzyme silences gene expression. Mol. Cell 82, 2148–2160 (2022).
Article CAS PubMed Google Scholar
Wu, W. Y. et al. The miniature CRISPR–Cas12m effector binds DNA to block transcription. Mol. Cell 82, 4487–4502 (2022).
Article CAS PubMed Google Scholar
Bigelyte, G. et al. Innate programmable DNA binding by CRISPR–Cas12m effectors enable efficient base editing. Nucleic Acids Res. 52, 3234–3248 (2024).
Article CAS PubMed PubMed Central Google Scholar
Charpentier, E., Richter, H., van der Oost, J. & White, M. F. Biogenesis pathways of RNA guides in archaeal and bacterial CRISPR–Cas adaptive immunity. FEMS Microbiol. Rev. 39, 428–441 (2015).
Article CAS PubMed PubMed Central Google Scholar
Faure, G. et al. Comparative genomics and evolution of trans-activating RNAs in class 2 CRISPR–Cas systems. RNA Biol. 16, 435–448 (2018).
Article PubMed PubMed Central Google Scholar
Zilberzwige-Tal, S. et al. Reprogrammable RNA-targeting CRISPR systems evolved from RNA toxin-antitoxin. Cell 188, 1925–1940 (2025).
Hu, Y. et al. Metagenomic discovery of novel CRISPR–Cas13 systems. Cell Discov. 8, 107 (2022).
Article CAS PubMed PubMed Central Google Scholar
Smargon, A. A. et al. Cas13b is a type VI-B CRISPR-associated RNA-guided RNase differentially regulated by accessory proteins Csx27 and Csx28. Mol. Cell 65, 618–630 (2017).
VanderWal, A. R. et al. Csx28 is a membrane pore that enhances CRISPR–Cas13b-dependent antiphage defense. Science 380, 410–415 (2023).
Article CAS PubMed PubMed Central Google Scholar
Chylinski, K., Makarova, K. S., Charpentier, E. & Koonin, E. V. Classification and evolution of type II CRISPR–Cas systems. Nucleic Acids Res. 42, 6091–6105 (2014).
Article CAS PubMed PubMed Central Google Scholar
Zhang, S. et al. Pro-CRISPR PcrIIC1-associated Cas9 system for enhanced bacterial immunity. Nature 630, 484–492 (2024).
Article CAS PubMed Google Scholar
Georjon, H. & Bernheim, A. The highly diverse antiphage defence systems of bacteria. Nat. Rev. Microbiol. 21, 686–700 (2023).
Article CAS PubMed Google Scholar
Faure, G. et al. CRISPR–Cas in mobile genetic elements: counter-defence and beyond. Nat. Rev. Microbiol. 17, 513–525 (2019).
Article CAS PubMed PubMed Central Google Scholar
Koonin, E. V. & Makarova, K. S. CRISPR in mobile genetic elements: counter-defense, inter-element competition and RNA-guided transposition. BMC Biol. 22, 295 (2024).
Article CAS PubMed PubMed Central Google Scholar
Koonin, E. V., Makarova, K. S., Wolf, Y. I. & Krupovic, M. Evolutionary entanglement of mobile genetic elements and host defence systems: guns for hire. Nat. Rev. Genet. 21, 119–131 (2020).
Article CAS PubMed Google Scholar
Koonin, E. V. & Makarova, K. S. Evolutionary plasticity and functional versatility of CRISPR systems. PLoS Biol. 20, e3001481 (2022).
Article CAS PubMed PubMed Central Google Scholar
Li, W. et al. Discovering CRISPR–Cas system with self-processing pre-crRNA capability by foundation models. Nat. Commun. 15, 10024 (2024).
Article CAS PubMed PubMed Central Google Scholar
Steinegger, M. & Soding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).
Article CAS PubMed Google Scholar
Couvin, D. et al. CRISPRCasFinder, an update of CRISRFinder, includes a portable version, enhanced performance and integrates search for Cas proteins. Nucleic Acids Res. 46, W246–W251 (2018).
Article CAS PubMed PubMed Central Google Scholar
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Article CAS PubMed PubMed Central Google Scholar
Marchler-Bauer, A. et al. CDD: NCBI’s conserved domain database. Nucleic Acids Res. 43, D222–D226 (2015).
Article CAS PubMed Google Scholar
Deorowicz, S., Debudaj-Grabysz, A. & Gudys, A. FAMSA: fast and accurate multiple sequence alignment of huge protein families. Sci. Rep. 6, 33964 (2016).
Article CAS PubMed PubMed Central Google Scholar
Esterman, E. S., Wolf, Y. I., Kogay, R., Koonin, E. V. & Zhaxybayeva, O. Evolution of DNA packaging in gene transfer agents. Virus Evol. 7, veab015 (2021).
Article PubMed PubMed Central Google Scholar
Dehal, P. S. et al. MicrobesOnline: an integrated portal for comparative and functional genomics. Nucleic Acids Res. 38, D396–D400 (2010).
Article CAS PubMed Google Scholar
Neri, U. et al. Expansion of the global RNA virome reveals diverse clades of bacteriophages. Cell 185, 4023–4037 (2022).
Article CAS PubMed Google Scholar
Zimmermann, L. et al. A completely reimplemented MPI bioinformatics toolkit with a new HHpred server at its core. J. Mol. Biol. 430, 2237–2243 (2018).
Article CAS PubMed Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article CAS PubMed PubMed Central Google Scholar
Holm, L. DALI and the persistence of protein shape. Protein Sci. 29, 128–140 (2020).
Article CAS PubMed Google Scholar
Edgar, R. C. Muscle5: high-accuracy alignment ensembles enable unbiased assessments of sequence homology and phylogeny. Nat. Commun. 13, 6968 (2022).
Article CAS PubMed PubMed Central Google Scholar
Saitou, N. & Nei, M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987).
CAS PubMed Google Scholar
Xu, C. et al. Programmable RNA editing with compact CRISPR–Cas13 systems from uncultivated microbes. Nat. Methods 18, 499–506 (2021).
Article CAS PubMed Google Scholar
Kannan, S. et al. Compact RNA editors with small Cas13 proteins. Nat. Biotechnol. 40, 194–197 (2022).
Article CAS PubMed Google Scholar
Galperin, M. Y., Makarova, K. S., Wolf, Y. I. & Koonin, E. V. Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Res. 43, D261–D269 (2015).
Article CAS PubMed Google Scholar
Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).
Article CAS PubMed PubMed Central Google Scholar
Takeuchi, N., Wolf, Y. I., Makarova, K. S. & Koonin, E. V. Nature and intensity of selection pressure on CRISPR-associated genes. J. Bacteriol. 194, 1216–1225 (2012).
Article CAS PubMed PubMed Central Google Scholar
Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

K.S.M., S. A. Shmakov, Y.I.W., P.M., D.H.H. and E.V.K. are supported through the Intramural Research Program of the National Institutes of Health of the USA (National Library of Medicine). S. A. Shmakov is supported by funding of the Basic Research Program by the National Research University Higher School of Economics. C.L.B. is supported by European Research Council grants 865973 and 101158249. D.C. and J.T. are supported by Arbor Biotechnologies. P.H. is supported by IFF. S.M. was supported by funding from the Natural Sciences and Engineering Research Council of Canada (Discovery program) and holds a Tier 1 Canada Research Chair in Bacteriophages. F.J.M.M. is supported by the grants PID2023-150750NB-I00 (funded by MICIU/AEI/10.13039/501100011033 and by ERDF/EU), PROMETEU/2021/057 (funded by Conselleria d’Educació, Cultura, Universitats i Ocupació, Generalitat Valenciana, Spain) and INNEST/2024/427 (funded by Agencia Valenciana de Innovación—IVACE + I Innovación—and by the European Union through the ERDF Program of the Valencian Community 2021–2027). P.P. receives funding from the EMBC under EMBO Installation Grant (5342-2023) and is supported by the LMT under the ‘University Excellence Initiatives’ (measure number 12-001-01-01-01, project S-A-UEI-23-10). R.P.-R. is supported by a Lundbeck Fonden grant (R347-2020-2346) and a research grant from VILLUM FONDEN (VIL60763). S. A. Shah is a recipient of a Novo Nordisk Foundation project grant in basic bioscience (NNF18OC0052965). C.V. is supported by intramural funds of the Vilnius University. M.P.T. is supported by the National Institutes of Health (grant R35GM118160). A.F.Y. is supported by an Environmental Biotechnology Innovation Centre (EBIC), UKRI Engineering Biology Mission Hub grant (BB/Y008332/1). F.Z. is supported by Howard Hughes Medical Institute; Yang Tan Collective; Broad Institute Programmable Therapeutics Gift Donors; Pershing Square Foundation, William Ackman, and Neri Oxman; Phillips family; J. and P. Poitras; B.T. Charitable Trust. R. Backofen is supported by the German Research Foundation (SFB 1597/1 ‘Small Data’.

Author information

These authors contributed equally: Kira S. Makarova, Sergey A. Shmakov.

Authors and Affiliations

Division of Intramural Research, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
Kira S. Makarova, Sergey A. Shmakov, Yuri I. Wolf, Pascal Mutz & Eugene V. Koonin
National Research University Higher School of Economics, Moscow, Russia
Sergey A. Shmakov
Institute for Protein Design, University of Washington, Seattle, WA, USA
Han Altae-Tran
Helmholtz Institute for RNA-based Infection Research (HIRI), Helmholtz Centre for Infection Research (HZI), Würzburg, Germany
Chase L. Beisel
Medical Faculty, University of Würzburg, Würzburg, Germany
Chase L. Beisel
Kavli Institute of Nanoscience, Department of Bionanoscience, Delft University of Technology, Delft, The Netherlands
Stan J. J. Brouns
Max Planck Unit for the Science of Pathogens, Humboldt University, Berlin, Germany
Emmanuelle Charpentier
Arbor Biotechnologies, Cambridge, MA, USA
David Cheng & Jesse Tordoff
Department of Chemistry, University of California, Berkeley, CA, USA
Jennifer Doudna
Innovative Genomics Institute, University of California, Berkeley, CA, USA
Jennifer Doudna
Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
Jennifer Doudna
California Institute for Quantitative Biosciences, University of California, Berkeley, CA, USA
Jennifer Doudna
Gladstone Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA, USA
Jennifer Doudna
Howard Hughes Medical Institute, University of California, Berkeley, CA, USA
Jennifer Doudna
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
Daniel H. Haft
IFF, Dangé-Saint-Romain, France
Philippe Horvath
Département de biochimie, de microbiologie et de bio-informatique, Faculté des sciences et de génie, Université Laval, Québec City, Quebec, Canada
Sylvain Moineau
Departamento de Fisiología, Genética y Microbiología, Universidad de Alicante, Alicante, Spain
Francisco J. M. Mojica
LSC-EMBL Partnership Institute for Genome Editing Technologies, Life Sciences Center, Vilnius University, Vilnius, Lithuania
Patrick Pausch
Section of Microbiology, Department of Biology, University of Copenhagen, Copenhagen, Denmark
Rafael Pinilla-Redondo
Copenhagen Prospective Studies on Asthma in Childhood (COPSAC), Herlev and Gentofte Hospital, University of Copenhagen, Gentofte, Denmark
Shiraz A. Shah
Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
Virginijus Siksnys & Česlovas Venclovas
Biochemistry and Molecular Biology, Genetics and Microbiology, University of Georgia, Athens, GA, USA
Michael P. Terns
Biomedical Sciences Research Complex, University of St. Andrews, St. Andrews, UK
Malcolm F. White
Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario, Canada
Alexander F. Yakunin
Centre for Environmental Biotechnology, School of Natural Sciences, Bangor University, Bangor, UK
Alexander F. Yakunin
Broad Institute of MIT and Harvard, Cambridge, MA, USA
Feng Zhang
McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
Feng Zhang
Howard Hughes Medical Institute, Cambridge, MA, USA
Feng Zhang
Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
Feng Zhang
Yang Tan Collective, Massachusetts Institute of Technology, Cambridge, MA, USA
Feng Zhang
Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
Feng Zhang
Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
Feng Zhang
Archaea Centre, Department of Biology, Copenhagen University, Copenhagen, Denmark
Roger A. Garrett
BIOSS Centre for Biological Signaling Studies, Cluster of Excellence, University of Freiburg, Freiburg, Germany
Rolf Backofen
Laboratory of Microbiology, Wageningen University, Wageningen, The Netherlands
John van der Oost
Department of Food, Bioprocessing, and Nutrition Sciences, North Carolina State University, Raleigh, NC, USA
Rodolphe Barrangou

Authors

Kira S. Makarova
View author publications
Search author on:PubMed Google Scholar
Sergey A. Shmakov
View author publications
Search author on:PubMed Google Scholar
Yuri I. Wolf
View author publications
Search author on:PubMed Google Scholar
Pascal Mutz
View author publications
Search author on:PubMed Google Scholar
Han Altae-Tran
View author publications
Search author on:PubMed Google Scholar
Chase L. Beisel
View author publications
Search author on:PubMed Google Scholar
Stan J. J. Brouns
View author publications
Search author on:PubMed Google Scholar
Emmanuelle Charpentier
View author publications
Search author on:PubMed Google Scholar
David Cheng
View author publications
Search author on:PubMed Google Scholar
Jennifer Doudna
View author publications
Search author on:PubMed Google Scholar
Daniel H. Haft
View author publications
Search author on:PubMed Google Scholar
Philippe Horvath
View author publications
Search author on:PubMed Google Scholar
Sylvain Moineau
View author publications
Search author on:PubMed Google Scholar
Francisco J. M. Mojica
View author publications
Search author on:PubMed Google Scholar
Patrick Pausch
View author publications
Search author on:PubMed Google Scholar
Rafael Pinilla-Redondo
View author publications
Search author on:PubMed Google Scholar
Shiraz A. Shah
View author publications
Search author on:PubMed Google Scholar
Virginijus Siksnys
View author publications
Search author on:PubMed Google Scholar
Michael P. Terns
View author publications
Search author on:PubMed Google Scholar
Jesse Tordoff
View author publications
Search author on:PubMed Google Scholar
Česlovas Venclovas
View author publications
Search author on:PubMed Google Scholar
Malcolm F. White
View author publications
Search author on:PubMed Google Scholar
Alexander F. Yakunin
View author publications
Search author on:PubMed Google Scholar
Feng Zhang
View author publications
Search author on:PubMed Google Scholar
Roger A. Garrett
View author publications
Search author on:PubMed Google Scholar
Rolf Backofen
View author publications
Search author on:PubMed Google Scholar
John van der Oost
View author publications
Search author on:PubMed Google Scholar
Rodolphe Barrangou
View author publications
Search author on:PubMed Google Scholar
Eugene V. Koonin
View author publications
Search author on:PubMed Google Scholar

Contributions

K.S.M., S. A. Shmakov, Y.I.W., P.M., P.P. and E.V.K. researched the data for the article. K.S.M., Y.I.W. and E.V.K. wrote the article. K.S.M., S. A. Shmakov, Y.I.W., P.M., H.A-T., C.L.B., S.J.J.B., E.C., D.C., J.D., D.H.H., P.H., S.M., F.J.M.M., P.P., R.P-.R., S. A. Shah, V.S., M.P.T., J.T., Č.V., M.F.W., A.F.Y., F.Z., R.A.G., R. Backofen, J.v.d.O., R. Barrangou and E.V.K substantially contributed to the discussion of the content, edited and approved the paper.

Corresponding author

Correspondence to Eugene V. Koonin.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Microbiology thanks Thomson Hallmark and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Updated classification of type I CRISPR–Cas systems.

The figure schematically shows representative (typical) CRISPR–cas loci for each class 1 subtype and selected distinct variants, with the dendrogram on the left showing the likely evolutionary relationships between the types and subtypes. The column on the right indicates the organism and the corresponding gene range. Homologous genes are colour-coded and identified by a family name. The gene names follow the previous classification⁸. The pink shading shows the effector module. The grey shading of different hues shows the two levels of classification, subtypes and variants. Where both a systematic name and a legacy name are commonly used, the legacy name is given under the systematic name. DNA and RNA are the molecules targeted by the CRISPR–Cas systems.

Extended Data Fig. 2 Updated classification of CRISPR types III, IV and VII CRISPR.

Designations are the same as in Extended Data Fig. 1. Additional subunits of effector complexes are shown as grey arrows. Most of the subtype III-B, III-C, III-E, III-F loci as well as IV-B and IV-C loci lack CRISPR arrays and are shown accordingly although for each of the type III subtypes exceptions have been detected. Dashed line leading to type III-E indicates its likely origin from III-D2. Abbreviations: CHAT, protease domain of the caspase family; TPR, Tetratricopeptide repeats; RT, reverse transcriptase. DNA and RNA are the molecules targeted by the systems.

Extended Data Fig. 3 Updated classification of CRISPR types II and VI.

a,b, The figure schematically shows representative (typical) CRISPR–cas loci of each type II (a) and type VI subtype (b) and selected distinct variants, with the dendrogram on the left showing the likely evolutionary relationships between the types and subtypes. The column on the right indicates the organism and the corresponding gene range. Homologous genes are colour-coded and identified by a family name following the previous classification⁸. Where both a systematic name and a legacy name are commonly used, the legacy name is given under the systematic name. The grey shading of different hues shows the two levels of classification, subtypes and variants. The adaptation module genes cas1 and cas2 are present in only a subset of the subtype VI-A and VI-D loci and are accordingly shown by dashed lines. The WYL-domain-encoding genes and csx27 genes are also dispensable and thus shown by dashed lines. Additional genes encoding components of the interference module, such as tracrRNA, are shown. The domains of the effector proteins are colour-coded: RuvC-like nuclease, green; HNH nuclease, yellow; HEPN RNase, purple; transmembrane domains, blue. DNA and RNA are the molecules targeted by the systems. c, Deep relationships among type VI effector families. Profile–profile comparisons were performed and the UPGMA dendrogram was constructed as described in Methods. The tree is based on the most conserved C-terminal HEPN domain alignments only. HHsearch was run with the minimum length coverage for hits set to l = 0.33, -u = 2.3 -gcut = 0.667. Multiple alignments (profiles) of the C-terminal HEPN domains are available in data file 5 in ref. ¹⁵, and the original tree is available in data file 4 in ref. ¹⁵. The dashed line corresponds to the tree depth D between 1.5 and 2 (D = 2 roughly corresponds to the pairwise HHsearch similarity score of exp(2D) ≈ 0.02 relative to the self-score) and separates most of the subtypes assigned previously or in this work. New subtypes are highlighted by blue. d, Organization of representative loci for distinct variants of subtype VI-B. Designations are the same as in b.

Extended Data Fig. 4 Updated classification of type V CRISPR–Cas systems.

a, Schematics of organization of type II CRISPR–Cas systems. The figure schematically shows representative (typical) CRISPR–cas loci of each experimentally characterized and/or described in previous classification type V subtypes and distinct variants, with the dendrogram on the left showing the likely evolutionary relationships between the types and subtypes. The column on the right indicates the organism and the corresponding gene range. Homologous genes are colour-coded and identified by a family name following the previous classification⁸. Where both a systematic name and a legacy name are commonly used, the legacy name is given under the systematic name. The grey shading of different hues shows the two levels of classification, subtypes and variants. The adaptation module genes cas1 and cas2 are present in only a subset of the type V subtypes. Dispensable (and/or missing in some subtypes and variants) components are indicated by dashed outlines. Additional genes encoding components of the interference module, such as tracrRNA, are shown. The domains of the effector proteins are colour-coded: RuvC-like nuclease (RuvC motifs I, II, III), green. b, Deep relationships between type V effector families. Profile–profile comparisons (RuvC-domain only) were performed and the UPGMA dendrogram was constructed as described in Methods. Specifically, HHsearch was run with the minimum length coverage for hits set to 0.033, -u = 2.3 -gcut = 0.667. Multiple alignments (profiles) used for this analysis are available in data file 5 in Ref. ¹⁵ and the original tree is available in data file 4 in ref. ¹⁵. The dashed line corresponds to the tree depth D between 1.5 and 2 (D = 2 roughly corresponds to the pairwise HHsearch similarity score of exp(2D) ≈ 0.02 relative to the self-score) and separates most of the subtypes that were previously assigned previously or in this work. New subtypes are highlighted by blue colour. IS605 (magenta) stands for TnpB RNA-guided nuclease encoded by IS605 family transposons and not associated with CRISPR arrays.

Extended Data Fig. 5

Alphafold 3 models for III-G, III-H, III-I CRISPR effector complexes compared with solved structures of III-D1, III-E and VII effector complexes. The models are the same as schematically shown in Fig. 2b. Distinct subunits are coloured as in Fig. 2b.

Extended Data Fig. 6 CRISPR–Cas subtype III-I.

a, Comparison of representative typical III-D2, III-E and III-I loci organization. Cas7 and Cas11 domains within large multidomain proteins are shown by boxes within the respective arrows. b, Dendrograms for individual Cas7 domains. The larger tree is a hybrid UPGMA/FastTree tree built for all best hits obtained by PSI-BLAST search with the three individual Cas7 domain of Cas7-11i used as queries. The smaller UPGMA dendrogram was built using a matrix of Pairwise rmsd scores as obtained by DALI comparison for individual Cas7 domain from the Cas7-11i AF3 model, Cas7-11e structure (7zol), AF3 model for GwCas7*3 28 and the Cas7 domain from the III-D effector complex structure (8bmw). Underneath the trees, a scheme of similarity between Cas7 domain of Cas7-11i and Cas7-11e are shown. Solid line indicates Cas7 domains with structural similarity and dashed line shows domains with significant sequences similarity. c, AF3 model for Cas7-11i (WP_207677910.1) and Cas10i (WP_207677911.1) complexed with crRNA from subtype III-D. Cas7 and putative Cas11 domain are coloured, Cas10i is shown in blue. d, Structural alignment of AF2 model of Cas10i and Cas10d structure (8s9t-C). Below the DALI structure-guided alignment, the alignment of the catalytic motif (GGDD) of Cas10 and the corresponding region of Cas10i, demonstrating the disruption of the catalytic site in the latter.

Extended Data Fig. 7 Inactivated type V-B variant Cas12b3.

a, Genetic organization of Actinomyces sp. conjugative plasmid region encoding type IV-B and the B3 variant of subtype V-B. Plasmid-related genes are shown in brown. Other genes (black) are mostly uncharacterized or unrelated to CRISPR or known plasmid genes. b, Schematic representation of HHpred search results and substitution of key amino acids of the RuvC-I and RuvC-II sites in Cas12b3. c, Multiple alignment of mini CRISPR array associated with the V-B3 variant. CRISPR repeats are shown in red. Genome names and accession numbers are indicated above the alignment.

Extended Data Fig. 8 Hypothetical scenario for the origins and evolution of CRISPR–Cas systems.

The figure is an amended version of Fig. 6 from the 2020 classification of CRISPR–Cas systems⁸. The key evolutionary events are described to the right of the images. The multiforking arrows denote events that have been inferred to have occurred on multiple, independent occasions during the evolution of CRISPR–Cas systems. Additional abbreviations: “GGDD”, key catalytic motif of the cyclase/polymerase domain of Cas10 that is involved in the synthesis of cOA; TR, terminal repeats.

Extended Data Fig. 9 Example of Cas9 shuffling in situ and estimate of the shuffling frequency.

a, Phylogenetic analysis of subtype II-C genes from Flavobacteriales. Cas1 phylogenetic tree is shown on the left and Cas9 tree is shown on the right. Both trees were constructed using FastTree as described in Methods. Species of different genera are shown in different colours. Arrows indicate several outstanding examples of cas9 exchanged in situ (when closely related cas1 genes are associated with distantly related cas9 genes). The loci schematics and percent identity for selected genes are shown. b, Estimated shuffling rate of adaptation versus effector genes.

Extended Data Fig. 10 Deep relationships among structures of large subunits of type I effector c complexes.

a, Relationships between the structures of type I large subunit representatives (Cas8 and Cas10d families). Cas8 and Cas10d correspond to distinct profiles/subfamilies (sdata file 5 and 6 in ref. ¹⁵), and representatives of each subfamily was modelled with AF2 (ref. ⁶⁸; with max_template_date = 2024-12-12). In addition, resolved structures of Cas8 and Cas10d were retrieved from PDB⁷⁷. Additional domains present in some of the Cas8 and Cas10d proteins (such as Cas5, Cas3 and Cas11) were identified and trimmed off the structure to keep the respective core structures. These core structures were compared all against all using DALI⁶⁹. Pairs without detectable similarity (no Dali z-score reported) were set artificially to a z-score of 0.1. The pairwise DALI z-scores were normalized by the minimum of the self-scores and converted to a distance matrix on the natural log scale. The UPGMA dendrogram was reconstructed from this distance matrix. A depth of ~1 (red dashed line) corresponds to a pairwise z-score of ~7.5–9. Profile IDs are indicated after the vertical bar. b, Profile–profile comparisons for large subunits of type I effector complexes (Cas8 and Cas10d families) were performed and the UPGMA dendrogram was constructed as described in Methods. HHsearch was run with the minimum length coverage for hits set to l = 0.33, -u = 2.3 -gcut = 0.667. Multiple alignments (profiles) used for this analysis are available in data files 5 and 6 in ref. ¹⁵. The dashed line corresponds to the tree depth D = 2, roughly corresponding to the pairwise HHsearch similarity score of exp(2D) ≈ 0.02 relative to the self-score. Typically, this tree depth reflects reliable sequence similarity.

Supplementary information

Supplementary Information (download PDF )

Supplementary Notes, Discussion and Tables 1–3.

Reporting Summary (download PDF )

Peer Review file (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Makarova, K.S., Shmakov, S.A., Wolf, Y.I. et al. An updated evolutionary classification of CRISPR–Cas systems including rare variants. Nat Microbiol 10, 3346–3361 (2025). https://doi.org/10.1038/s41564-025-02180-8

Download citation

Received: 03 April 2025
Accepted: 07 October 2025
Published: 06 November 2025
Version of record: 06 November 2025
Issue date: December 2025
DOI: https://doi.org/10.1038/s41564-025-02180-8