Main

Among the most widespread genes in all branches of life, transposons are potent agents of genetic change, as they mediate genome rearrangements through a variety of mechanisms. Recently, a group of transposon-encoded accessory proteins, termed obligate mobile element-guided activity (OMEGA) systems, were discovered to possess DNA cleavage activity, guided by a noncoding RNA called ωRNA, and are thought to be ancestors of Cas9 and Cas12 effectors from the class 2 clustered regularly interspaced short palindromic repeats (CRISPR) nuclease family1,2,3. One class of OMEGA proteins, called TnpB, has evolved into different Cas12 subtypes on multiple occasions, helping to explain the diversity of that family of CRISPR effectors4,5,6,7. Indeed, bioinformatics analyses of TnpB homologs indicate a wide range of both structure and function, with diverse protein architectures and catalytic-site geometries4,8. Eukaryotic TnpB homologs, referred to as Fanzors, can be identified in a wide range of organisms, including protists, fungi, arthropods, plants and eukaryotic viruses, and similarly function as RNA-guided nucleases6,7,9,10,11. Fanzors have attracted considerable interest as genome-editing tools, both for their natural functionality in eukaryotic cells and their substantially smaller size compared to Cas9 and Cas12 proteins. Fanzors are broadly categorized into two distinct clades, Fanzor1 and Fanzor2. Recent structural studies were conducted on Fanzor1 (refs. 9,10) but similar information remained lacking for the more compact Fanzor2. Here, we set out to fill that gap and characterize a representative member of the Fanzor2 clade to understand how these endonucleases recognize target DNA, as well as their relationship to Fanzor1 and the TnpB superfamily.

Results

Structure of ApmFz2–ωRNA–target DNA ternary complex

We reconstituted a ternary complex of Acanthamoeba polyphaga mimivirus Fanzor2 (ApmFz2) with the native ωRNA scaffold and target DNA substrate11 and determined its structure. ApmFz2 constitutively associates with its ωRNA (247 nt); therefore, we coexpressed it with an ωRNA scaffold to promote complex stability. The ωRNA construct was designed with a hepatitis delta virus self-cleaving ribozyme at the 3′ end to produce a fixed-length 21-nt guide RNA (gRNA)12,13. The resulting ternary complex is biochemically active (Extended Data Fig. 1a) and exhibits cleavage activity consistent with previous results11. The target DNA substrate used in cryo-electron microscopy (cryo-EM) imaging was designed to form an RNA–DNA hybrid and promote ternary complex formation (Fig. 1a).

Fig. 1: Cryo-EM structure of ApmFz2 ternary complex.
figure 1

a, Diagram showing target DNA substrate (bottom) annealed to ωRNA (top). Nucleotides not observed in the cryo-EM structure are light gray. The gRNA is pink and the TAM is purple. On the DNA molecule, TS marks the target strand and NTS marks the nontarget strand. b, Top: domain organization of ApmFz2, with domain boundaries indicated by residue numbers. NTD, white; REC, aqua; WED, orange-yellow; RuvC, green; ZnF, pink. Bottom: detailed annotation of the NTD. The NLS was not observed in the cryo-EM map. The RuvC-R (blue-green) and thumb (blue) regions structurally reinforce the RuvC and REC domains, respectively. Gray boxes indicate linker regions not specifically assigned to a domain. c, Cryo-EM reconstruction (top) and atomic model (bottom) of ApmFz2 ternary complex. Domains and nucleic acid molecules are the same colors as in a,b, except for ωRNA, which is shown in white. Insets in d,e are boxed and labeled. d, Close-up view of the thumb domain associated with target DNA in conjunction with the WED and REC domains. A transparent surface is overlaid on the atomic model and the thumb domain is colored according to its electrostatic potential, with blue indicating a highly basic surface. e, View of the active site bound to the precleaved target DNA substrate. The catalytic triad is shown as green sticks and the magnesium ion is shown as a green sphere.

The resulting 2.99 Å resolution cryo-EM map (Table 1 and Extended Data Fig. 1) enabled nearly complete building of both protein (468 of 520 aa) (Fig. 1b,c) and ωRNA (119 of 247 nt). The nuclear localization signal (NLS; corresponding to residues 1–53) is not observed, consistent with disorder predictions11. ApmFz2 is much more similar to TnpB than to a previously characterized Fanzor1 (ref. 9) (Extended Data Fig. 2), indicating a closer relationship to prokaryotic TnpB, at least at an architectural level, as previously suggested9,10,11. Like TnpB, ApmFz2 has a recognition (REC) domain, a wedge (WED) domain, a RuvC domain and a zinc finger (ZnF) domain (Fig. 1b). Notably, ApmFz2 has an N-terminal domain (NTD; residues 53–130) that is not observed in any other available TnpB structures (Extended Data Figs. 2 and 3). Part of this NTD (residues 65–91) appears to complete the RuvC fold and interacts with ωRNA (described in detail below); thus, we annotated it as the RuvC-R domain (Fig. 1b,c). Another region of the NTD (residues 109–130) appears to clasp onto the target DNA substrate in coordination with the REC and WED domains (Fig. 1c,d); thus, we call it the ‘thumb’ (Fig. 1b). The thumb is highly basic and seems to sterically invade the target DNA duplex adjacent to the transposon-adjacent motif (TAM) (Fig. 1d). It is possible that the thumb motif may serve to stabilize or further unwind the target DNA and guide the target strand into the central channel.

Table 1 Cryo-EM data collection, refinement and validation statistics

TAM recognition

We were particularly interested in understanding the basis of TAM recognition, because ApmFz2 has a distinct TAM motif (5′-GGG-3′)11 compared to characterized Fanzor1 and TnpB TAM motifs, which tend to be more AT-rich (5′-CATA-3′ and 5′-TTGAT-3′, respectively)2,9. Consistent with expectations, we observe Arg194 and His215 in the REC domain forming base-specific interactions with the nontarget strand, with dG(0) and dG(−2) (Extended Data Fig. 4a–c). On the target strand, Glu260 in the WED domain forms base-specific interactions with dC(1) (Extended Data Fig. 4d). In contrast to existing Fanzor1 and TnpB structures, we observe only minimal interactions upstream of the TAM (Extended Data Fig. 4b), consistent with the shorter TAM sequence motif in Fanzor2 (3 nt, compared to 4–5 nt in Fanzor1 (ref. 9) and TnpB (ref. 2)).

ApmFz2 ternary complex active-site architecture

Fanzor2, like TnpB, cleaves single-stranded DNA (ssDNA) downstream of the TAM and intriguingly does not exhibit collateral (that is, nonspecific) DNA cleavage activity11. Previous TnpB ternary complexes did not have sufficient density to build a complete RuvC and ZnF active site (Extended Data Fig. 5), which was attributed to the flexibility of the cleavage domain14,15. We observe the RNA–DNA heteroduplex occupying the central channel, formed by the WED, REC and RuvC domains9,14,15 (Fig. 1c and Extended Data Fig. 4e–g). In contrast to TnpB, we observe the entire density of the catalytic site within the RuvC and ZnF domains (Fig. 1c), along with four nucleotides of ssDNA occupying the active site (Fig. 1e). The resolution was insufficient for unambiguous assignment of all four bases but it was sufficient for distinguishing a purine at the 3′ end (Extended Data Fig. 6a). Therefore, we built a 4-nt 3′-ACCC-5′ ssDNA model. In addition, the ssDNA occupying the active site is not cleaved at the expected location according to the active-site geometry (Extended Data Fig. 6a). Within the active site, a catalytic triad is formed by Asp324, Glu467 and Asp501 (Fig. 1e). Notably, Glu467 is shifted 50 residues toward the C terminus relative to its canonical position in the TnpB amino acid sequence14,15 (Extended Data Figs. 2 and 7). This rearranged catalytic site is unique to Fanzor and its ancestral TnpB (commonly referred to as TnpB2 or pro-Fanzor), suggesting that they represent a distinct evolutionary branch, separate from the Cas12 family of CRISPR endonucleases10,11. The catalytic triad coordinates a single Mg2+ ion (Fig. 1e and Extended Data Fig. 6a). This contrasts with the postcleavage SpuFz1 structure and cleavage-inhibited Cas12 structures, which both show the catalytic triad coordinating two Mg2+ ions9,16 (Extended Data Fig. 5). Together with the evidence showing that our purified ApmFz2 purification is biochemically active (Extended Data Fig. 1a), this observation suggests that the ApmFz2 ternary complex captured by cryo-EM is in an inhibited state. This might be because of the differing conditions used to assemble the ternary complex for cryo-EM imaging, with an excess of DNA oligonucleotides (which could cause nonspecific interactions) and 2 mM Mg2+ (lower than 10 mM Mg2+ used for our biochemical assay). The latter observation suggests that binding of the second Mg2+ may be rate limiting.

ωRNA architecture and recognition

Approximately half of the ωRNA scaffold is not visible because of the flexibility of its long stem loops (Fig. 2a). For the remainder of the ωRNA, the high quality of the cryo-EM map allowed us to distinguish purines and pyrimidines (Extended Data Fig. 6b,c) and to assign the nucleotide sequence with confidence. The last 156 nt of the ωRNA scaffold (−160 to −5) are base-paired, as correctly predicted by RNA secondary-structure prediction methods11. However, the first 45 nt (−205 to −161) form long-range interactions, including a pseudotriplex and pseudoknot, with nucleotides at positions −4 to 0 (Fig. 2a). It is worth noting that the RNA pseudoknot is a feature shared between TnpB and ApmFz2 (Fig. 2b); it forms the core of the ωRNA scaffold and is located next to the WED domain, both key characteristics in TnpB and Cas12 families14,15,16,17,18,19,20. The WED domain, in tandem with the RuvC and RuvC-R domains, forms a groove that recognizes the pseudoknot (Extended Data Fig. 8a–c). Remarkably, the RuvC-R subdomain along with RuvC appears to make base-specific and backbone interactions, mediated by Asp397 and Lys401 to G(−192):C(−2) and Asp397, Lys401, Ser83 and Asn82 to C(−191):G(−3) (Extended Data Fig. 8c). The remainder of the ωRNA scaffold is recognized solely by the RuvC domain, which appears to make backbone-specific hydrogen-bonding interactions with the stem 3 and stem 2 portions of the ωRNA (Extended Data Fig. 8d,e). In contrast to the TnpB and Fanzor1 structures, we do not observe any protein interactions with stem 1, which is recognized by the RuvC and WED domains in Fanzor1 (ref. 9) and TnpB (ref. 15).

Fig. 2: ωRNA architecture and structural comparison of existing TnpB structures.
figure 2

a, A 2D schematic of ωRNA, showing Watson–Crick base pairing (solid lines) and noncanonical interactions. The ωRNA scaffold and guide region span nucleotides −205 to 0 and 1 to 21, respectively. PK, pseudoknot. Disordered regions are in a dashed gray box. b, Comparison of protein and ωRNA structural features across known TnpB ternary complex structures: ApmFz2 (this study), Deinococcus radiodurans TnpB (PDB 8H1J) and Sphenodon punctatus Fanzor1 (PDB 8GKH). Top: protein domain diagram, with domains colored as in Fig. 1a. Middle: RNP complex structure. The ωRNA is colored white. Bottom: the ωRNA structure, with structural features colored as in a, shown in the same view as in the ternary structure (middle row).

Our findings show that some essential structural features within the ωRNA scaffold, such as the pseudoknot, may be obscured by secondary-structure predictions. This highlights the importance of experimental structure determination to reveal functional features that are conserved across large evolutionary timescales in ribonucleoprotein (RNP) complexes. Fanzor1 lacks the aforementioned pseudoknot RNA motif (Fig. 2b), establishing that the pseudoknot is not conserved across Fanzors9. We modeled the long stem loops that lack density (Fig. 2a), predicting that they project away from the core of the structure (Extended Data Fig. 9), which could indicate that they may not all be essential for activity. Indeed, previous studies showed that truncation of the RNA long stem loops of TnpB and Fanzor1 did not affect activity9,14,15. Furthermore, ωRNAs across TnpB homologs typically contain predicted long stem loops, suggesting that the ωRNA scaffold has a wide range of structural variability, consistent with the idea that not all ωRNA structural features are essential. Lastly, our Fanzor2 structure reveals a distinct evolutionary trajectory in which every RNA domain has increased in size, in contrast with the available Fanzor1 structure9, which shows the RNA domains have become truncated or have disappeared entirely (Fig. 2 and Extended Data Fig. 10).

Discussion

Collectively, our structure shows how the compact eukaryotic Fanzor2 carries out ωRNA recognition, TAM recognition and target DNA loading. We also reveal the architecture of its ωRNA, highlighting key features that distinguish Fanzor2 and Fanzor1. The unique structured N-terminal extension of Fanzor2 has subdomains that reinforce the core of the protein and interacts with the DNA duplex and RNA pseudoknot. The rearranged catalytic site highlights the plasticity of TnpB family effectors. Our findings provide a framework for future protein engineering directions and advance our understanding of the evolution from prokaryotic TnpB proteins to eukaryotic Fanzor proteins.

Methods

Protein production and purification

ApmFz2 was overexpressed in Escherichia coli BL21 star (DE3) cells. The cells were cotransformed with pCDF-ApmFz2-ωRNA and pET15b-ApmFz2 expression plasmids (Supplementary Table 1). A single colony was used to grow a starter culture overnight. Then, 10 ml of starter culture was used to inoculate 1 L of 2xYT medium containing 100 μg ml−1 ampicillin and 50 μg ml−1 spectinomycin at 37 °C with shaking until the cell density reached an optical density at 600 nm of ~0.7. The overexpression of protein was induced by 0.6 mM IPTG and grown at 18 °C for 18 h. The cells were harvested by centrifugation at 6,240g for 15 min (4 °C) and the cell pellet was frozen at −80 °C until needed. The cell pellet was resuspended in lysis buffer (20 mM Tris-HCl pH 8.0, 500 mM NaCl, 5% glycerol, EDTA-free cOmplete protease inhibitor cocktail (Roche) and 1 mM DTT). Cells were disrupted using a cell disruptor (Constant Systems) at 20,000 psi. The lysate was cleared by centrifugation at 48,380g for 30 min at 4 °C. The lysate was applied to a gravity column after incubation with pre-equilibrated Strep-Tactin Sepharose resin (IBA Life Sciences) for 30 min at 4 °C. The column was washed with 15 column volumes of 20 mM Tris-HCl pH 8.0, 500 mM NaCl, 5% glycerol and 1 mM DTT. The protein was eluted with 20 mM Tris-HCl pH 8.0, 500 mM NaCl, 5% glycerol, 1 mM DTT and 5 mM d-desthiobiotin. The eluted fractions were verified by running 4–20% SDS–PAGE gels. The fractions were pooled together and diluted with 20 mM Tris-HCl pH 8.0, 50 mM NaCl, 5% glycerol and 1 mM DTT buffer until the NaCl concentration reached 200 mM. Next, the resulting sample was loaded onto a 5-ml HiTrap Heparin column (Cytiva) pre-equilibrated with buffer A (20 mM Tris-HCl pH 8.0, 200 mM NaCl, 5% glycerol and 1 mM DTT). The RNA-bound samples were eluted with 40–60% buffer B (20 mM Tris-HCl pH 8.0, 1 M NaCl, 5% glycerol and 1 mM DTT). The eluted fractions were verified by SDS–PAGE and the desired protein fractions were pooled together and concentrated using a 50 kDa-cutoff membrane filter unit (Millipore). The concentrated sample was injected onto a Superose 6 Increase 10/300 GL column (Cytiva) equilibrated with buffer C (20 mM Tris-HCl pH 8.0, 200 mM NaCl, 2 mM MgCl2, 10 µM ZnCl2, 5% glycerol and 1 mM DTT). The purified protein was concentrated to 2.4 mg ml−1.

In vitro cleavage assay

The DNA substrate was produced using PCR from gBlocks (Integrated DNA Technologies) as templates (Supplementary Table 1). The amplified DNA substrate was purified using a QIAquick PCR purification kit (Qiagen).

The cleavage reaction (25 µl) was set up by mixing 1.6 µM purified DNA substrate with 5 µM freshly purified Fz2 in reaction buffer (100 mM NaCl, 50 mM Tris-HCl pH 7.9, 10 mM MgCl2 and 1 mM DTT) at 25 °C and incubated at 37 °C for 2 h. The tube was then incubated on ice for 10 min to quench the reaction. It was then heated at 95 °C for 15 min followed by cooling to 50 °C after the addition of 10 μg of RNase A (Qiagen) for 10 min. Next, 50 µl of buffer-saturated phenol was added and the tube was vortexed and spun immediately. Then, 20 µl of the aqueous phase was aspirated out and mixed with 3 µl of 6× New England Biolabs DNA loading dye. The sample was then run on a 5% Mini-PROTEAN TBE gel (Bio-Rad) as per the protocol recommended by the manufacturer. Gels were stained with 1× SYBR gold (Thermo Fisher Scientific) and imaged on a ChemiDoc gel imager (Bio-Rad). Each in vitro cleavage assay was performed in triplicate.

Target DNA preparation

The reaction mixture contained equimolar oligo concentrations of Fz2_CEM_sub_top and Fz2_CEM_sub_bot (Supplementary Table 1) in nuclease-free water. The mixture was annealed in a thermocycler (95 °C to 10 °C, Δ1.5 °C ramp per min).

Sample preparation for cryo-EM

The ApmFz2–RNA–DNA complex was reconstituted in vitro by mixing 20 μM ApmFz2–RNA with 30 μM assembled target DNA in buffer C (20 mM Tris-HCl pH 8.0, 200 mM NaCl, 2 mM MgCl2, 10 µM ZnCl2, 5% glycerol and 1 mM DTT) at 37 °C for 10 min.

Cryo-EM grid preparation

UltrAuFoil grids (R1.2/1.3 300 mesh, MiteGen) were cleaned for 40 s in a Solarus II plasma cleaner (Gatan) before the application of 3.5 µl of the sample (~1.5 mg ml−1) and plunge-freezing in liquid ethane using a Vitrobot mark IV (FEI) with 95% chamber humidity at 10 °C.

Cryo-EM data acquisition and processing

ApmFz2–RNA–DNA complex

Data were collected on a Talos Arctica 200 keV microscope (Thermo Fisher Scientific) equipped with a K3 direct electron detector and a BioQuantum energy filter. Sample grids were imaged at 200 kV, with an intended defocus range of −2.25 to −0.5 μm and a magnification of ×79,000 in electron counting mode (1.044 Å per pixel). Videos were collected with a total dose of 51 e per Å2. A total of 3,205 videos were recorded with EPU software (Thermo Fisher Scientific). Downstream processing was performed in cryoSPARC 4.4.0 (ref. 21). Movies were motion-corrected and summed using Patch motion correction in cryoSPARC. The contrast transfer function (CTF) was estimated using Patch CTF in cryoSPARC. Initially, 1,000 micrographs were processed, particles were picked using blob picker and extracted and two-dimensional (2D) classes were generated. The good classes were used as templates to perform template-based particle picking and 2D classification. Subsequently, good 2D classes were used for training the Topaz model for the entire dataset22. Particles were extracted with a 360-pixel box and subjected to multiple rounds of 2D classification to remove junk particles followed by multiple rounds of heterogeneous refinements. The resulting best class was used for nonuniform refinement in cryoSPARC. CTF parameters were refined on a per-micrograph and per-particle basis using cryoSPARC global CTF refinement and local CTF refinement, respectively. Particles were then subjected to local motion correction and then to homogenous refinement followed by nonuniform refinement23. The resolution was estimated using the gold-standard method. Local resolution was estimated using cryoSPARC.

Model building, refinement and analysis

The ApmFz2–ωRNA–DNA ternary structure was generated using an initial model of the protein predicted from AlphaFold2 and an initial model of the ωRNA predicted from RNAcomposer24,25. The resulting models were first docked into the cryo-EM density and manually rebuilt using Coot. Certain parts of the model were manually remodeled or rebuilt using Coot version 0.9.8.2 (ref. 26). For the DNA substrate, real-space refinement was carried out in PHENIX 1.21, with both base-pair and secondary-structure restraints being enforced27. The final structure was validated using PHENIX27 and MolProbity28. Structural representations for figures were created using UCSF ChimeraX29 and Adobe Illustrator (https://adobe.com/products/illustrator).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.