Main

Proteins rarely perform their biological functions in isolation but rather rely on protein–protein interactions (PPIs) to execute complex biological processes. Designing protein binders that can specifically target and regulate PPIs therefore holds immense therapeutic and biotechnological potential. However, traditional methods for generating protein binders, such as immunization, antibody library screening or directed evolution, are often laborious, time-consuming and provide limited control over the target site.

Computational protein design offers a powerful alternative, enabling the tailoring of binders to specific targets and binding sites. Physics-based methods such as Rosetta allowed early binder design by means of scaffolding and side-chain optimization2,3. However, such methods suffer from low experimental success rates (less than 0.1%) and require the sampling of many designs2,4. Moreover, they typically require the docking of predefined scaffolds onto a fixed target structure, leading to surface incompatibilities and suboptimal binding, or even precluding the targeting of certain sites.

Recent breakthroughs in deep learning have revolutionized the field of biomolecular modelling. Models such as AlphaFold2 (AF2)1 have demonstrated remarkable capabilities in accurately predicting protein structures and complex PPIs. Indeed, AF2 filtering has been shown to increase the success rates of binder design by evaluating the plausibility of predicted complexes4,5. Deep learning has also been successfully applied for de novo design of proteins and binders. The current state-of-the-art methods involve the use of RFdiffusion5 for backbone generation coupled with ProteinMPNN sequence generation6. When applied to binder design, this approach shows significantly improved success rates compared with previous methods5. However, RFdiffusion relies on sequence design over side-chain-free backbones placed at a rigid target interface, with binder selection ultimately depending on AF2-based complex prediction to identify plausible interactions. This highlights a gap between backbone generation and functional interface design that AF2 filtering helps to bridge.

Given the use of AF2 in improving binder filtering success, we proposed that we could harness it directly for the design of protein binders. We present BindCraft, a user-friendly pipeline for de novo design of protein binders that requires minimal user intervention and computational expertise. BindCraft leverages backpropagation through the AF2 network to efficiently hallucinate new binders and interfaces (Fig. 1a). We demonstrate the efficiency of our pipeline on 12 diverse, challenging and therapeutically relevant protein targets (Fig. 1b). We identify several high-affinity binders for each target without the need for high-throughput screening of hundreds to thousands of designs experimentally. This marks an important advancement in the design of protein binders on demand, and makes binder design accessible to research groups without expertise in computational design methods or access to high-throughput screening facilities.

Fig. 1: De novo binder design using BindCraft.
Fig. 1: De novo binder design using BindCraft.
Full size image

a, Schematic representation of the BindCraft binder design pipeline. Given a target protein structure, a binder backbone and sequence is generated using AF2 multimer, then the surface and core of the binder are optimized using MPNNsol while keeping the interface intact. Finally, designs are filtered based on AF2 monomer model prediction. b, Overview of protein targets for binder design. Parts of the model coloured in green were used during design, grey areas were excluded. Values in the blue box indicate the number of successful designs, where binding was observed on SPR measurement versus the total number of designs tested. Values in the yellow box indicate the measured Kd of the highest affinity binder without experimental sequence optimization, whereas values in orange boxes indicate estimated Kd* values due to poor fit. PD-1 binders were tested as a bivalent Fc fusion.

Accurate design of de novo binders

Our goal was to create an accessible, efficient and automated pipeline leveraging AF2 for accurate binder design with minimal experimental screening. We use the ColabDesign implementation of AF2 to backpropagate hallucinated binder sequences through AF2 weights and calculate an error gradient. This error gradient is used to update and optimize the binder sequence to fit specific design criteria (Methods), as in previous hallucination approaches7,8,9,10. By iterating over the network, we can enable the generation of binder structure, sequence and interface concurrently (Fig. 1a). In contrast to methods such as RFdiffusion5 or RIFdock2,4, which keep the target backbone fixed during design, BindCraft repredicts the binder–target complex at each design iteration. This allows for defined levels of flexibility on the side chain and backbone for both binder and target, resulting in backbones and interfaces that are moulded to the target binding site. The resulting target backbone root mean square deviation (r.m.s.d.) ranges from 0.5 Å to 5.5 Å (Extended Data Fig. 1a). Flexibility can further be increased by masking the sequence of the input template and providing only Cα coordinates (Supplementary Fig. 1a). To mitigate the generation of purely alpha-helical binders, we apply a ‘negative helicity loss’, which enables the design of fully beta-sheeted binders (Extended Data Fig. 1b), albeit with a decrease in in silico success rates (Supplementary Fig. 1b,c).

We use AF2 multimer11 for designing initial binders, as this version of AF2 was trained on protein complexes and would probably be able to more accurately model PPIs compared with AF2 monomer1. We note that AF2 multimer hallucinates on average 20% larger interfaces, with a larger proportion of loops and higher confidence than the AF2 monomer model (Supplementary Fig. 1d). We use all five trained model weights of AF2 multimer to avoid overfitting of sequences to a single model. However, we and others12,13 have previously demonstrated that AF2-hallucinated proteins can show low levels of expression when tested experimentally. We therefore subsequently optimize the sequence of the binder core and surface using a message-passing neural network (MPNNsol)6,12 while keeping the interface intact (Fig. 1a). The optimized binder sequences are repredicted using the AF2 monomer model1. This model was exclusively trained on monomeric proteins, which minimizes prediction bias of PPIs and enables robust filtering for high quality interfaces. Last, as deep learning models have been shown to sporadically produce physically improbable results1,11, we filter the predicted designs based on AF2 confidence metrics, as well as Rosetta physics-based scoring metrics (Methods).

Each target shows varying levels of in silico design success, with 16.8–62.7% of initial AF2 trajectories showing satisfactory confidence metrics and 0.6–65.9% of MPNNsol-optimized designs passing the final computational filters after AF2 monomer complex reprediction (Supplementary Fig. 1e). In silico design success rates are dependent on the target protein and the length of binders being generated (Supplementary Fig. 1f). When compared to the state-of-the-art binder design approach RFdiffusion5, BindCraft yields similar success rates in terms of generation time across several targets and binder lengths (Extended Data Fig. 1c). Notably, we observe a difference in amino acid distribution at the binder interface, with an underrepresentation of bulky amino acids in RFdiffusion-generated designs (Extended Data Fig. 1d).

We benchmarked BindCraft on 12 targets to assess its generalizability (Fig. 1b). The designs show broad sequence and structural diversity (Supplementary Fig. 2a and Supplementary Data 1 and 2), with an average template modelling (TM) score of 0.62 and 14.4% sequence identity to closest Protein Data Bank (PDB) hits (Extended Data Fig. 1e). This suggests that designed proteins recover elements of known structural motifs while also sampling novel folds, consistent with the limited structural diversity expected for compact scaffolds. The high geometric and chemical complementarity of BindCraft binders (Supplementary Data 1) enables the design of new high-affinity interfaces (Extended Data Fig. 1e and Supplementary Fig. 2b). We assessed the novelty of the designed interfaces by comparison to known interactions in the PDB using PPIRef14,15. The designed interfaces showed an average TM score of 0.15 across all targets, indicating that they are distinct from naturally occurring PPIs.

All the described steps are automated into a single workflow, with settings optimized to ensure the design procedure is generalizable across different targets. This allows research groups without protein design expertise to generate binders on demand for any application. By minimizing human intervention needed to generate and sort high quality binder designs, BindCraft democratizes protein binder design and makes it accessible to a broader scientific community.

Binders targeting cell-surface receptors

To test the performance of our pipeline, we designed binders against therapeutically relevant cell-surface receptors and tested for binding activity in vitro. We first targeted the human PD-1 protein, a key immune checkpoint receptor expressed on the surface of T cells16. We purified and screened 53 designs for binding using biolayer interferometry (BLI) in a bivalent Fc-fusion format. We observed a binding signal for 13 binders, with the best binder showing an apparent dissociation constant (Kd*) lower than 1 nM (Fig. 2a,b), although the exact Kd could not be determined due to the extremely slow dissociation rate and avidity effect from the Fc-fusion construct. To confirm the binding site, we performed a competition assay with the well-characterized anti-PD-1 monoclonal antibody, pembrolizumab, which should engage the same binding site. Indeed, our binder could not outcompete the antibody binding (Kd = 27 pM), indicating it is targeting overlapping binding sites (Extended Data Fig. 2a).

Fig. 2: Binder design targeting cell-surface receptors.
Fig. 2: Binder design targeting cell-surface receptors.
Full size image

a, Design model of binder2 in complex with PD-1. b, Representative BLI sensorgram showing binding kinetics of binder2 (bivalent Fc fusion) to PD-1. c, Design model of binder4 in complex with PD-L1. d, Binding affinity determination by SPR for the PD-L1–binder4 interaction. e, Design model of binder5 in complex with IFNAR2. f, Binding affinity determination by SPR for the IFNAR2–binder5 interaction. g, Design model of binder1 in complex with CD45. h, SPR binding affinity fit for binder1. i, Schematic of CpE-based cytotoxicity and CLDN1 binder inhibition. j, Single cycle kinetic analysis with SPR of CLDN1 binder12 binding to soluble analogues of CLDN1. k, Cell-based assay showing concentration-dependent inhibition of CpE cytotoxicity by CLDN1 binder9, binder12 and CpE inhibitor. Bar plots represent the mean of n = 2 replicates, with standard deviation indicated by error bars. l, MST measurements showing blocking of CpE binding to CLDN1 wild type when preincubated with binder12. MST data were plotted from a single representative measurement. Panel i was created using BioRender (https://biorender.com).

Encouraged by these results, we opted to test fewer designs for all subsequent targets to minimize experimental screening. We next designed binders against PD-L1 (ref. 16) and the interferon 2 receptor (IFNAR2)17, both important modulators of immune signalling. We tested nine designs against PD-L1 out of which seven showed a binding signal (Supplementary Fig. 3a), whereas for IFNAR2 we could detect binding for three out of nine designs (Fig. 1b). The top performing binder4 against PD-L1 showed a Kd* of 615 nM (Fig. 2b,c) as determined by surface plasmon resonance (SPR) and an expected alpha-helical signature as measured by circular dichroism (Supplementary Fig. 3b). Size exclusion chromatography with multi-angle light scattering (SEC–MALS) analysis shows binder4 to be dimeric in solution, but to engage in a 1:1 binding mode with its target (Extended Data Fig. 2b). We then probed the binding of our PD-L1 binder4 using a previously characterized de novo designed binder3. We could confirm they compete for the intended target binding site (Extended Data Fig. 2c), while engaging in a distinct mode of binding compared with PD-1 (Extended Data Fig. 2d).

The top performing binder5 against IFNAR2 showed an affinity of 260 nM by SPR (Fig. 2e,f), a typical alpha-helical signature and high stability (Supplementary Fig. 3c), and monomeric nature in solution (Extended Data Fig. 2e). We tested binder5 against IFNAR2’s native binding partner, the cytokine interferon alpha 2 (IFNA2)17. We observe competition for the IFNA2 binding site, validating our designed binding mode (Extended Data Fig. 2f), while primarily occupying distinct binding sites (Extended Data Fig. 2g). To assess the specificity of our designs, we probed the binding of each top performing binder against other immunoglobulin-like fold receptors. Despite their structural similarity (Supplementary Data 2), we observe no off-target binding (Extended Data Fig. 2h). We observe the AF2 i_pTM metric to effectively discriminate the on-target interactions from off-targets (Extended Data Fig. 2h). These results demonstrate that we are able to efficiently design binders, straight from the computational design pipeline, against known binding sites, without the need for extensive screening to identify hits with nanomolar affinity.

Next, we sought to determine whether our pipeline could design binders against extracellular receptors lacking well-characterized binding sites. We selected CD45 as a target because of the structural complexity of its extracellular domain, comprising four immunoglobulin-like domains d1–d4 with heavy N-glycosylation in the smallest isoform18. We tested 16 binders experimentally, out of which 4 showed binding on SPR (Fig. 1b). The best performing binder1 showed a Kd of 14.7 nM and targeted the junction region of domains d3 and d4 (Fig. 2g,h). We also observed the expected alpha-helical signal in circular dichroism, validating the correct folding of our design (Supplementary Fig. 3d). These results indicate that BindCraft can also effectively design binders against new or previously uncharacterized binding sites.

Targeting unexplored binding sites

Membrane proteins lacking distinct extracellular domains are of critical biological and therapeutic importance, yet pose a significant challenge for binder design due to difficulties in experimental validation and screening. Computationally designed soluble analogues that retain natural epitopes12 offer a promising solution by enabling rapid prescreening of potential binders. To validate this strategy, we targeted claudins, which are critical for maintaining epithelial and endothelial tight junction barrier integrity19. Claudins are naturally targeted by Clostridium perfringens enterotoxin (CpE), which forms a membrane-penetrating pore that leads to cell death20. We proposed that binders competing with CpE for its binding site could mitigate cytotoxicity (Fig. 2i).

Using a soluble analogue of claudin 1 (sCLDN1)12, we designed binders against the extracellular domain and prescreened them for binding using two variants of the soluble analogue (Supplementary Fig. 4a,b). We tested seven binders and found all except binder17 to bind to sCLDN1-14 and sCLDN1-18 (Supplementary Fig. 4a), which both harbour the native CLDN1 extracellular epitope (Supplementary Fig. 4b). We observed the strongest binding signal for binder12, which showed nanomolar affinity for the soluble analogues (Fig. 2j). To assess the binder’s utility, they were tested against wild-type claudin 1 (CLDN1 WT) using a cell-based cytotoxicity assay. Here, binder9 and binder12 effectively inhibited CpE-based cytotoxicity, protecting CLDN1 WT-expressing cells from cell death in a concentration-dependent manner and on the order of a known CpE inhibitor (Fig. 2k and Supplementary Fig. 4c). Notably, both of these binders result from the same initial trajectory and carry the same interface residues.

To validate that the inhibition of cytotoxicity was the result of direct interactions with CLDN1 WT, we used microscale thermophoresis (MST). We found that both CpE and binder12 interacted with CLDN1 WT, and that preincubation of binder12 with CLDN1 WT blocked CpE binding, indicating competition for the same binding site (Fig. 2l and Extended Data Fig. 2i). The binders failed to protect claudin 4 (CLDN4)-expressing cells from CpE-induced toxicity (Supplementary Fig. 4d,e), most probably due to CpE’s roughly 400-fold higher affinity for CLDN4 (ref. 20). Our findings demonstrate that soluble analogues can enable the discovery of binders that effectively modulate membrane protein function.

To assess the generalizability of our pipeline for targeting proteins lacking known binding sites, we designed binders against a protein with no natural sequence homologues. We chose the de novo designed beta-barrel fold 14 (BBF-14)12, as beta-barrels are not commonly regarded as PPI partners. We purified the 11 top-scoring designs from which 6 showed binding (Fig. 1b). The best binder, binder4 (Extended Data Fig. 3a), is composed of a mixed alpha-beta topology, with the interface formed by both the split beta-sheets and a helix motif. The beta-sheet interface is not mediated by backbone hydrogen bonding, but rather by side-chain interactions. Binder4showed a Kd of 20.9 nM for BBF-14, as determined by SPR (Extended Data Fig. 3b). To assess the fidelity of our design procedure, we solved a structure of BBF-14 bound to binder4 (Extended Data Fig. 3c,d, Extended Data Table 1). When aligned on the BBF-14 target, binder4 has a backbone r.m.s.d. of 1.7 Å, confirming both the accuracy of the fold and the designed binding mode (Extended Data Fig. 3c). This result underscores our ability to generate binders purely based on structural information, without relying on existing binding sites or any influence from co-evolutionary data.

Last, we selected the conserved structural protein SAS-6 as a design target. SAS-6 oligomers are essential for centriole biogenesis in eukaryotes21. Using BindCraft, we generated several designs and experimentally tested nine top-scoring binders. Binder4 binds with 5.7 μM affinity to the monomeric form of CrSAS-6 (Extended Data Fig. 3e,f) and 4.2 μM affinity to the dimeric form (Extended Data Fig. 3g), indicating compatibility with its oligomeric form. It targets an overlapping site with the previously reported monobody MBCRS6-15 (Extended Data Fig. 3h), which shifts SAS-6 assembly from a ring to a helical structure22. We speculate that BindCraft enables on-demand binder design to probe biological function, even within higher-order assemblies.

Masking allergenic epitopes

Allergic rhinitis and seasonal allergies affect up to 50% of the population in some countries23. Current treatments primarily target global inflammation with immunosuppressants and monoclonal antibodies. However, neutralizing allergic reactions could potentially offer a more effective strategy for managing allergies. Allergens comprise a diverse group of proteins with different folds, biological functions and highly charged surfaces24. Generally, hydrophobic binding sites are considered more tractable for computational binder design2, making allergens more challenging targets.

To test BindCraft’s ability to target allergens, we designed binders against dust mite allergens Der f7 and Der f21, and the major birch allergen Bet v1, which is responsible for up to 95% of birch-related allergies25. We examined 10 designs against Der f7 experimentally and identified 4 binders (Fig. 1b), with binder2 showing the highest binding affinity with a Kd of 12.8 nM (Fig. 3a). To confirm the binding mode of binder2, we solved crystal structures in complex with Der f7 obtaining two crystal forms with resolutions of 2.2 Å and 3.0 Å (Extended Data Fig. 4a,b and Extended Data Table 1). Aligned on the allergen, binder2 shows a backbone r.m.s.d. of 1.7 Å (Fig. 3b), validating the design’s structural accuracy. Binder2 is monomeric in solution (Extended Data Fig. 4c) and binds the same epitope as mouse monoclonal antibodies raised against Der f7 (ref. 26).

Fig. 3: Designs occluding epitopes of common allergens.
Fig. 3: Designs occluding epitopes of common allergens.
Full size image

a, Left: design model of binder2 against dust mite allergen Der f7. Right: SPR binding affinity fit for binder2. b, Crystal structure (coloured) of the Der f7–binder2 complex overlaid with the design model (grey). c, Left: design model of binder10 against dust mite allergen Der f21. Right: SPR binding affinity fit for binder10. d, Crystal structure (coloured) of the Der f21–binder10 complex overlaid with the design model (grey). e, Left: design model of binder2 against birch allergen Bet v1. Right: SPR binding affinity fit for binder2. f, SEC–MALS analysis of Bet v1 allergen (blue, expected molecular weight (MW) 18.5 kDa) and Bet v1 mixed with binder2 (orange, expected molecular weight 29.3 kDa). g, Cryo-EM structure (PDB 7MXL) of Bet v1 bound to commercial anti-Bet v1 REGN antibody mix. h, Competition assay on immobilized REGN5713-Bet v1 complex binding of the REGN5714 antibody but not Bet v1 binder2, confirming binding at the designed site. i, Blocking ELISA showing the capacity of the REGN antibody mix (orange) or binder2 (blue) to prevent the binding of Bet v1 to IgE from the sera from three patients allergic to birch. Number suffix represents individual serum from a patient. Data points represent average of two technical replicates with the error bars depicting standard deviation.

Similarly, we evaluated seven binders against Der f21 and could detect binding for four designs by SPR (Fig. 1b). The best performing binder10 showed an apparent affinity of 793 nM (Fig. 3c). Although dimeric in solution (Extended Data Fig. 4d), a 2.6 Å resolution crystal structure validates a 1:1 mode of binding of binder10 against a highly charged helical site of Der f21 (Extended Data Fig. 4e). The binder10 shows a backbone r.m.s.d. of 3.1 Å, caused by an alternative rotamer conformation of an interface tyrosine (Fig. 3d). Mutational analyses of Der f21 indicate that our binders target epitopes distinct from those recognized by IgE in the sera of allergic individuals27.

Last, we identified two successful designs from seven tested binders against the birch allergen Bet v1 (Fig. 1b). Binder2 showed a 120 nM binding affinity by SPR (Fig. 3e), dimerizes in solution (Extended Data Fig. 4f), but in complex with Bet v1 shows a mass of 27.8 kDa, indicative of a 1:1 binding mode (Fig. 3f). The binder2 has a warped helical topology, where its C-terminal helix inserts itself into the ligand binding pocket of Bet v1 (ref. 28). To assess the specificity of allergen-targeting binders, we incubated the top binders with each of the three allergens. Even at 10 μM binder concentration, we observe no off-target binding to other allergens (Extended Data Fig. 4g), indicating high specificity of the designed anti-allergens.

Previously, a cocktail mix of three antibodies binding to three immunogenic epitopes of Bet v1 was developed to prevent allergic response29. Its cryogenic electron microscopy (cryo-EM) structure indicates that our binder targets a known epitope recognized by the REGN5713 antibody (Fig. 3g). To validate, we immobilized REGN5713 on SPR and loaded the Bet v1 allergen on it. We observe a binding signal with REGN5714 as the analyte, but not with binder2, confirming that it targets an overlapping epitope with REGN5713 (Fig. 3h). We further proposed that our binders can compete with Bet v1 specific IgE present in serum samples from patients who are to allergic birch, similar to the REGN antibody mix29. To test the neutralization activity of our anti-Bet v1 binder2, we performed a blocking enzyme-linked immunosorbent assay (ELISA) using the serum of three patients allergic to birch with high titre of anti-Bet v1 IgE. Biotinylated Bet v1 was preincubated with either the REGN antibody cocktail or our designed binder2 (Fig. 3i). The REGN antibody mix blocked up to 90% of Bet v1 binding to IgE, whereas our single binder blocked up to 50% in two out of three donors. This is on par with blocking rates of single antibodies29, indicating that there is therapeutic potential for de novo designed binders in neutralizing allergic responses.

Modulating multi-domain nucleases

Nucleic acid interaction interfaces are widely considered undruggable due to their large, charged and convex surfaces, which are difficult to target with small molecules30. To test our pipeline on such interfaces, we focused on the multi-domain CRISPR–Cas9 nuclease from Streptococcus pyogenes (SpCas9). Although widely used for gene editing due to its programmability31, SpCas9 originally functions as a bacterial immune system protecting against phages32. In response, phages evolved small anti-CRISPR proteins (Acrs) that block CRISPR–Cas by occluding nucleic acid binding sites33. We wondered whether artificial Acrs could be designed to mimic this function.

We designed binders against the bipartite REC1 domain of SpCas9, containing a highly charged guide RNA-binding pocket34 (Fig. 4a). All six tested binders bound the full-length apo SpCas9 enzyme (Supplementary Fig. 5a). The top performing binder3 and 10 showed apparent binding affinities in the range of 300 nM by SPR, although complete titration curves were challenging to obtain. To validate their binding mode, we attempted to solve cryo-EM structures of binder3 and binder10 bound to the full-length SpCas9 apo enzyme. Despite the high quality of the data and clearly observable density for the binders (Extended Data Fig. 5a), we were unable to obtain a satisfactory cryo-EM density to build an atomic model due to poor resolution in the target area (Extended Data Fig. 5b). This observation could be because of the dynamic nature of the apo form of Cas9 (ref. 35). Nevertheless, we observe clear density at the REC1 site and can confidently dock both binders, validating the designed binding mode (Fig. 4b,c and Extended Data Fig. 5c,d).

Fig. 4: Targeting nucleic acid interactions with de novo binders against nucleic acid-guided multi-domain nucleases.
Fig. 4: Targeting nucleic acid interactions with de novo binders against nucleic acid-guided multi-domain nucleases.
Full size image

a, Zoom in on the SpCas9 REC1 domain with bound guide RNA (PDB 4ZT0). A designed binder is overlaid in the binding pocket. b, Cryo-EM structure of binder3 bound to the apo form of SpCas9. The REC1 domain is highlighted in green, the rest of SpCas9 is in grey. Cryo-EM density overlaid in grey. c, Cryo-EM structure of binder10 bound to the apo form of SpCas9. The REC1 domain is highlighted in green, the rest of SpCas9 is coloured in grey. Cryo-EM density overlaid in grey. d, SpCas9-based editing of HEK293T cells in the absence (grey bar, dashed line) or presence of designed binders (green bars) or natural Acrs (blue bars). e, Structural architecture of Clostidium butyricum Argonaute with bound gDNA and tDNA (PDB 6QZK). The PAZ domain and N + PIWI domains used as design targets are highlighted in light and dark blue. f, CbAgo-gDNA-mediated cleavage of target DNA in the absence (grey bar, dashed line) or presence of designed binders (green bars) or designed SpCas9 binders (blue bars). Bar plots represent the mean of n = 3 replicates, with standard deviation indicated by error bars. g, CbAgo-gDNA-mediated cleavage of target DNA in absence of binders (grey line) or in presence of designed binder2 (pink line) or binder3 (purple line). Plotted points represent an average of three measurements with standard deviation indicated by error bars.

To evaluate their function, we cotransfected human embryonic kidney 293T (HEK293T) cells with CRISPR–SpCas9 and either designed binders or natural Acrs36,37,38. We observe a significant reduction of gene editing activity in the presence of our designed binders (Fig. 4d). They outperform AcrIIC2, which inhibits guide RNA loading using a different targeting mechanism37. AcrIIA2 and AcrIIA4, which inhibit target DNA (tDNA) binding (Supplementary Fig. 5b), nearly eliminate gene editing activity, underscoring the differences in inhibitory strategies. These results demonstrate that BindCraft can generate previously unseen inhibitors of nucleic acid-interacting proteins by means of previously unseen mechanisms.

To expand our binder design to other large nucleases, we designed binders against the multi-domain Argonaute (Ago) nuclease from Clostridium butyricum (CbAgo). Akin to Cas9, CbAgo acts as an immune system that uses small oligonucleotide guides to target and cleave invading DNA39,40. So far, no natural inhibitors of Argonaute nucleases have been described. We designed binders targeting either the N-PIWI channel or the PAZ domain of CbAgo (Fig. 4e). We tested the effect of 12 binders on CbAgo-mediated tDNA cleavage and two binders strongly inhibit CbAgo activity (Fig. 4f). Whereas 0.4 μM CbAgo alone has a kcat of 0.004 s−1, in presence of 2 μM binder2 and binder3 the kcat is reduced 80-fold to 5 × 10−5 s−1 and 40-fold to 9.8 × 10−5 s−1, respectively (Fig. 4g). We found that binder2 binds to CbAgo with a Kd of 5 nM, as determined by BLI (Extended Data Fig. 5e). SEC analysis of binder2 with CbAgo validates that it forms a stable complex with CbAgo (Extended Data Fig. 5f). Adding the guide DNA (gDNA) destabilizes the CbAgo–binder2 complex, which confirms that binder2 occupies the gDNA binding channel (Extended Data Fig. 5g,h).

These results demonstrate that we can design protein binders even against challenging nucleic acid binding sites and grooves, potentially opening paths towards new types of protein-based therapeutic, gene editing modulator and molecular biology tool for basic research.

AAV retargeting for gene delivery

Viral vectors, such as those derived from adeno-associated viruses (AAVs), have expanded gene therapy possibilities by leveraging the natural ability of viruses to introduce genetic material into cells and tissues. However, AAVs have poor specificity to cell types, tissues and organs. Achieving specific targeting often requires high doses, raising the risk of off-target effects and immunogenicity. Several efforts have been made to modify the tropism of AAV capsids, by insertion of peptide segments41 or receptor-binding moieties, such as DARPins42. However, such approaches involve library screening or immunization campaigns, usually with limited control over the target site. We proposed that BindCraft could efficiently design miniprotein binders capable of retargeting AAVs to cell-type specific receptors (Fig. 5a). Its high design success rate could enable direct AAV transduction testing in cellulo, bypassing biochemical prescreening, and providing a platform for the rapid development of retargeted AAV vectors to cells and tissues of interest.

Fig. 5: Engineering targeted gene delivery by AAV.
Fig. 5: Engineering targeted gene delivery by AAV.
Full size image

a, Schematic representation illustrating AAV-cmv-GFP retargeting on genetic insertion of a cell-type receptor-specific miniprotein binder, replacing the natural primary attachment to cell-surface glycans. b, Chimeric assembly of a retargeted AAV particle, composed of the capsid proteins with (pink) and without (green) inserted binder in a defined stoichiometric ratio. c, Transduction efficiency measured by flow cytometry of different AAV variants targeting HER2 or PD-L1, determined after transfer of packaging cell supernatant onto HEK293 cells stably overexpressing the respective target receptors. The signal-to-noise ratio, defined as target/non-target ratio between the transduction rates measured on each cell line, is indicated as ‘×’ fold change. For comparison, each of the two cell lines is similarly transduced with the wild-type AAV6-cmv-GFP (WT) and the AAV capsid variant carrying knockout (KO) mutations. Transduction efficiencies were measured in triplicates (n = 3) and error bars indicate a 95% confidence interval. d, Design model of binder1 against HER2. e, Design model of binder202 against PD-L1. f, Heatmap of the transduction rates at a normalized multiplicity of infection (MOI) of 1 × 105 vg per cell of the AAV variants carrying the binder1 against HER2 and binder202 against PD-L1, as well as the KO and WT controls, on HEK293 cells stably overexpressing the respective target receptors. g, Transduction with the PD-L1-targeting AAV carrying the binder202. The lower histogram shows that an anti-PD-L1 antibody, which targets the binding site of AAV-binder202, blocks the transduction of HEK293 cells stably overexpressing PD-L1. Panel a was created using BioRender (https://biorender.com).

Traditionally, retargeting molecules are either inserted into the variable regions VR-IV or VR-VIII protruding near the threefold symmetry axis of the AAV capsid, or fused to the N terminus of the viral capsid protein 2 (VP2). Based on a large mutational study on AAV capsid fitness43, we explored an alternative insertion site, located between residues 497 and 498 of the VR-V near the threefold symmetry axis of the AAV capsid (Fig. 5b). We chose AAV6-cmv-GFP as a starting vector and introduced point mutations to deplete its natural primary interactions with heparin and sialic acid (knockout, Fig. 5a). We then designed binders against HER2 and PD-L1 with an extra N-termini and C-termini distance loss to facilitate a direct capsid integration, using a short –(GSG)1– extension on each terminus (Fig. 5b).

To simultaneously screen the designed AAVs for production and transduction efficiency, a small-scale assay was designed that relies on directly transferring the supernatant of AAV-packaging cells onto the targeted cells (Extended Data Fig. 6a,b). This assay led us to identify one reprogrammed AAV to target HER2 and four targeting PD-L1 that showed enhanced specificity for HEK293 cells stably overexpressing their respective target receptor (Fig. 5c and Extended Data Fig. 6c). Characterization of most efficient variants, HER2–b1 and PD-L1–b202 (Fig. 5d,e), showed that both AAVs had enhanced specificity towards cells expressing their target receptor (Fig. 5f). When the interaction was challenged with an antibody targeting the same receptor-binding site, the transduction of PD-L1-expressing cells by the PD-L1-targeting AAV was blocked, suggesting that the designed binder mediates the transduction through the engagement with the target receptor (Fig. 5g).

Conclusions

The computational design of de novo PPIs has been a cornerstone problem in protein design, primarily due to our lack of understanding of the determinants of molecular recognition. The development of deep learning-based structure prediction networks, such as AF2 (refs. 1,11), has revolutionized the field and enabled more accurate filtering of de novo designs with favourable biochemical profiles. Here we introduce a robust pipeline based on backpropagation through the AF2 network, an approach that has been explored in previous studies7,8,9,10,44, and extend its capabilities to the hallucination of protein binders. Unlike most existing methods, BindCraft enables target protein flexibility, which is critical for capturing binding-induced structural changes essential for molecular recognition.

We demonstrate BindCraft’s performance across a diverse set of 12 challenging targets. The binder affinities lie predominantly in the nanomolar range, with success rates from 10% to 100%, with an average success rate of 46.3%, which is remarkable for a purely computational approach. These rates allow for the screening of far fewer designs experimentally to identify functioning binders, when compared with the current state-of the-art RFdiffusion5 and the recently described closed-source AlphaProteo binder design pipeline45. Notably, a binder designed with our pipeline recently ranked first in a community-wide binder design competition, showing 82 nM affinity against the challenging epidermal growth factor receptor (EGFR) target46.

The structural accuracy of our method not only allows us to create proteins that bind to defined surfaces but also enables their functionalization. We demonstrate this by using our designed binders to reduce the binding of birch allergen Bet v1 to specific IgE from patient-derived serum samples. Whereas a single binder showed moderate blocking activity, broader epitope coverage could enhance efficacy. De novo binders therefore offer a promising alternative to antibodies for such treatments, because of their high stability. However, owing to the synthetic nature of our binders and their relatively large size (60–240 amino acids), concerns about immunogenicity and effective delivery persist, although these issues are gradually being addressed in preclinical models47. Furthermore, BindCraft’s high experimental success rates allow direct screening of intended biological function, as exemplified by the retargeting of AAV towards specific cell-surface receptors, enabling precise and customizable transduction profiles. This promises to simplify the development of targeted viral vectors, offering a versatile platform for gene therapy applications, including therapeutic delivery to disease-relevant cells and tissues while minimizing the risk of potential off-target effects.

Despite the successes outlined here, there are limitations to the BindCraft design approach. Backpropagation through AF2 is GPU-intensive, and final design filtering with AF2 monomer in single sequence mode may exclude prospective high-affinity binders2,3,4,5,48 (Extended Data Fig. 7a,b). We assessed the possibility of using the recently released AlphaFold3 (ref. 49) model for filtering, but still found a large proportion of false positive predictions (Extended Data Fig. 7c). Furthermore, AF2 is known to be insensitive to point mutations50, which could be detrimental at PPI interfaces, but can be mitigated by orthogonal physics-based scoring methods, such as Rosetta51. Last, a potential limitation is the use of the AF2 i_pTM metric for the ranking of designs, which has emerged as a powerful binary predictor of binding activity (Extended Data Fig. 7a,b), but does not correlate with the interaction affinity46 (Supplementary Fig. 6). Nevertheless, BindCraft represents a significant leap in the accurate design of binders for direct functional applications. We foresee that through iterative refinement of our pipeline, we will eventually reach a ‘one design, one binder’ stage, omitting the need for screening. This will enable rapid generation of binders for applications in research, biotechnology and therapeutics for a wide range of research groups without protein design expertise.

Methods

BindCraft design protocol

The input and design settings for running the BindCraft pipeline are organized into user-friendly JSON files. To initiate design trajectories, a target PDB format structure needs to be specified, along with the desired minimum and maximum length of the binders, and the desired number of final filtered designs. A target hotspot can be specified as either individual residues or entire chains, or can be omitted completely in which case a binding site is selected according to the combined design loss.

The binder hallucination process is performed using the ColabDesign implementation of AF2. The design process is initialized with a random sequence for the binder, which is predicted in single sequence mode, and a structural input template for the target. This is passed through the AF2 network to obtain a structure prediction and calculate the design loss. The design loss function is composed of several terms, with default weight values indicated in parentheses:

  1. (1)

    binder confidence pLDDT (weight 0.1)

  2. (2)

    interface confidence i_pTM (weight 0.05)

  3. (3)

    normalized predicted alignment error (pAE) within the binder (weight 0.4)

  4. (4)

    normalized predicted alignment error (pAE) between binder and target (weight 0.1)

  5. (5)

    residue contact loss within binder (weight 1.0)

  6. (6)

    residue contact loss between the target and binder: if hotspots are specified, the rest of the target is masked from this loss (weight 1.0)

  7. (7)

    radius of gyration of binder (weight 0.3)

  8. (8)

    ‘helicity loss’: penalize or promote backbone contacts every one in a three-residue offset to promote the hallucination of helical or non-helical designs (weight −0.3)

  9. (9)

    optional ‘N&C termini loss’ increases the proximity of the N and C termini of the binder to allow splicing into protein loops (weight 0.1).

The loss function is used to calculate position specific errors, which are then backpropagated through the AF2 network to produce a L × 20 error gradient, where L is the sequence length. Using multiple iterations and stochastic gradient descent optimization, this error gradient is recomputed and used to optimize the input binder sequence for the next iteration to minimize the resulting loss. We backpropagate through the AF2 multimer model weights11 and swap randomly between the five trained models at each iteration to ensure robust sequence generation and reduce the risk of overfitting to a single model.

As our goal is to arrive at a real discrete sequence for the binding interface, the sequence optimization is performed in four stages. The first sequence optimization stage is performed in a continuous sequence space using logit inputs. At each step, the sequence representation is based on linear combination of (1 − λ) × logits + λ × softmax(logits/T), where λ = (step + 1)/iterations and temperature (T) of 1.0. Here, many amino acids are considered per each binder position, which allows the exploration of a larger and less constrained sequence-structure space. After 50 iterations, we terminate trajectories showing poor AF2 confidence scores, as we found that such trajectories rarely converge to high confidence designs. Furthermore, if a beta-sheeted trajectory is detected, we increase the number of recycles during design from one to three to ensure accurate prediction. The continuous sequence space optimization is then continued for a further 25 iterations. During the second optimization stage, the sequence logits are normalized to sequence probabilities using the softmax function for 45 iterations to funnel the design space towards a more realistic sequence representation defined as softmax(logits/T) At each step, the temperature is lowered, where temperature is equal to (1 × 10−2 + (1 − 1 × 10−2) × (1 − (step + 1)/iterations)2). The temperature is also used to scale the learning rate for rate decay. For the third stage, we implement the straight-through estimator, allowing the model to see the one-hot representation, but backpropagate through the softmax representation. This procedure is performed for five iterations. For the final fourth stage, the sequence inputs are converted to a one-hot discrete encoding. At each step, X random mutations are independently sampled and tested from the probability distribution of the softmax representation from the previous stage, and mutations with best loss are fixed. X is defined on the basis of the length of the binder sequence (0.05× binder length). This procedure is performed for 15 iterations. At the end, trajectories with pLDDT below 0.7, fewer than 7 interface contacts or significant backbone clashes are rejected.

Successful binder design trajectories are subjected to MPNNsol sequence optimization to improve stability and solubility12. To this end, we preserve binder residues in a 4 Å radius around the target interface, and design 20 new sequences for the remaining binder core and surface residues using the soluble weights of ProteinMPNN6, with a temperature of 0.1 and 0.0 backbone noise. These optimized sequences are then repredicted using the AF2 monomer model, with three recycles and two template-based models49 in single sequence mode, to ensure robust and unbiased complex assessment. Each of the two resulting models is then energy minimized using Rosetta’s FastRelax protocol52 with 200 iterations, and interface scores are computed using the InterfaceAnalyzer mover53 with side-chain and backbone movement enabled.

Designs are finally filtered using a set of predefined filters to ensure the selection of high quality designs for experimental testing. Filters were initially defined based on experimental observations from previous binder design studies2,3,4,5 and refined over the course of this work. These include:

  1. (1)

    AF2 confidence pLDDT score of the predicted complex (>0.8)

  2. (2)

    AF2 interface predicted confidence score (i_pTM) (>0.5)

  3. (3)

    AF2 interface predicted alignment error (i_pAE) (<0.35)

  4. (4)

    Rosetta interface shape complementarity (>0.60)

  5. (5)

    number of hydrogen bonds at the interface (>3)

  6. (6)

    number of unsaturated hydrogen bonds at the interface (<4)

  7. (7)

    hydrophobicity of binder surface (<35%)

  8. (8)

    r.m.s.d. of binder predicted in bound and unbound form (<3.5 Å)

  9. (9)

    fewer than three lysines and methionines at the binder interface.

We allow only two MPNNsol generated sequences per individual AF2 trajectory to pass filters to promote interface diversity amongst selected binders. This design procedure is set up to loop until a defined number of final desired designs is reached. For optimal results, we recommend running the design pipeline until at least 100 designs pass computational filters. This generally requires the sampling of about 300–3,000 trajectories. We then usually pick 10 designs from the top 20 (ranked by i_pTM) for experimental testing.

To generate designs against targets described in the section ‘Accurate design of de novo binders’, we used the input structures, binder specifications and hotspot designations described in Supplementary Table 1. For AF2 predictions, we used full-length input sequences from UniProt. In all cases, the amino acid cysteine was excluded from sequence design. For AAV targets, the N-termini and C-termini loss is activated with default weight.

Computational benchmarks of BindCraft

To evaluate the flexibility of the target structure post-design, the input PDB structure of the target was aligned to the target chain A of the design trajectory, and r.m.s.d. was calculated using PyRosetta. For increasing target flexibility, the sequence of the input target template was masked by enabling the flag ‘rm_target_seq’ in ColabDesign for trajectory hallucination54, and 200 trajectories were generated.

For the impact of the helicity loss on binder secondary structure composition, the ‘weights_helicity’ flag in BindCraft was set to 1, 0, −0.3, −1, −2 and −3, and 200 trajectories were generated for each instance using otherwise default settings.

To compare the design capabilities of AF2 monomer and multimer weights, we generated 200 trajectories each. For AF2 multimer trajectories, we used the default settings in which AF2 multimer models 1–5 are used for design and AF2 monomer models 1–2 trained with templates are used for reprediction. For AF2 monomer this is inverted, we use AF2 monomer models 1 and 2 for design and AF2 multimer models 1–5 for reprediction.

For benchmarks involving design and trajectory success rates, we run the design pipeline either for 200 trajectories or until 100 designs passing in silico filters are accumulated (where indicated). We then designate trajectories with pLDDT above 0.7 as ‘passing’, whereas trajectories that have a pLDDT below 0.7, more than 1 Cα backbone clash between chains or fewer than 3 contacts between the binder and target are designated ‘low confidence’.

RFdiffusion benchmarks were performed as described in the original publication5, with the exception of running the pipeline in deterministic mode for tracking purposes. Briefly, backbones of designated lengths were sampled using RFdiffusion against selected targets and sequences were designed using original ProteinMPNN weights with a temperature of 0.0001 and 8 sequences per backbone. Each complex was predicted using AF2 monomer model 1 and two MPNN designed sequences for each backbone were allowed to pass filters as defined in the original publication (pLDDT > 0.8, i_pAE < 0.32, binder r.m.s.d. < 1.0 Å). The pipeline was run until 100 designs passed filters. The computational time was calculated as backbone generation time + ProteinMPNN sequence generation + AF2 complex prediction for each design. Notably, although single model prediction was used in the case of RFdiffusion, we used prediction using two template-based AF2 models in the case of BindCraft.

Pairwise structural similarities and sequence identities across targets and binders in Supplementary Data 2 were extracted using Foldseek55 exhaustive search and TMalign alignment type.

To determine fold and interface novelty of designed binder complexes, we searched the binder chain against the PDB using Foldseek in TMalign mode. Hits with the highest template modelling score (qtmscore) and their sequence identities (fident) for each binder were plotted. Owing to the low resolution structural representations in Foldseek, an alternative strategy was used to assess interface novelty. Residues were extracted using PPIRef in a 6 Å radius around the designed interface, then searched against the precomputed PDB interaction pairs using the iDist method, with a default threshold of 0.04 (ref. 14). The closest interface hit is then aligned using USalign to calculate the template modelling score and sequence identity15.

Benchmarking of designs from other design pipelines was performed using the BindCraft prediction method of either AF2 monomer or multimer in single sequence, with templates provided for the target according to the specifications in their respective publications.

AlphaFold3 predictions of designed BindCraft complexes were performed using the AlphaFold3 server49 with multiple-sequence alignments and templates enabled.

Pairwise Pearson correlation coefficients (r) among experimental binding (yes, 1, no, 0), Affinity (nanomolar, length and all AF2 and Rosetta-derived features were computed and visualized as a heatmap to assess linear relationships and correlation across all pairs of values. Coefficient values outlined in the cells are considered significant at |r| ≥ 0.7.

Protein expression, purification and characterization

DNA sequences of designed proteins, as well as BBF-14, Der f7, Der f21 and Bet v1 targets were ordered from Twist Biosciences with Gibson cloning adaptors for cloning into bacterial expression vectors pET21b or pET11. Proteins were expressed in Escherichia coli BL21 Codon Plus (DE3) cells (Novagen) by inducing with 0.5 mM isopropyl-β-d-thiogalactoside for 6 h at 18 °C. Pellets were resuspended and lysed in lysis buffer (50 mM Tris-HCl pH 7.5, 500 mM NaCl, 5% glycerol, 1 mg ml−1 lysozyme, 1 mg ml−1 phenylmethylsulfonyl fluoride and 1 µg ml−1 DNase) using sonication. Cell lysates were clarified using ultracentrifugation, loaded on a 1 ml Ni-NTA Superflow column (Qiagen) and washed with 7 column volumes of 50 mM Tris-HCl pH 7.5, 500 mM NaCl and 10 mM imidazole. Proteins were eluted with 10 column volumes of 50 mM Tris-HCl pH 7.5, 500 mM NaCl, 500 mM imidazole. Claudin binders were dialysed against 20 mM HEPES pH 8.0, 150 mM NaCl, 4% glycerol and directly frozen.

The Fc-fused PD-L1 target3, IFNAR2 target, IFNA2 cytokine and antibodies were expressed using a mammalian Expi293 secreted expression system (Thermo Fisher Scientific, A14635). Six days posttransfection, the supernatants were collected, cleared and purified either using a 1 ml Ni-NTA Superflow column (Qiagen) or protein A affinity column (Qiagen). SAS-6 (ref. 22), SpCas9 (ref. 56), CbAgo and the catalytic mutant of CbAgo (D541A, D611A)40 have been purified as described previously.

Remaining bacterial and mammalian expressed proteins were then concentrated and injected onto a Superdex 75 16/600 or Superdex 75 10/300 gel filtration column (GE Healthcare) in 50 mM Tris-HCl pH 7.5, 250 mM KCl or PBS. Proteins after size exclusion were concentrated, frozen in liquid nitrogen and stored at −80 °C. Molar mass, sample homogeneity and multimeric state were confirmed using SEC–MALS (miniDAWN TREOS, Wyatt) by injecting 100 µg of protein in PBS (Column, Superdex 75 10/300 or Superdex 200 10/300, GE Healthcare). Folding, secondary structure content and melting temperatures were assessed using circular dichroism in a Chirascan V100 instrument from Applied Photophysics in PBS at a concentration of 0.1–0.3 mg ml−1.

Expression and purification of PD-1 target and binders

DNA sequences were synthesized in the pcDNA3.4 vector with an osteonectin secretion signal at the N terminus (Twist Biosciences). De novo designs were fused to the N terminus of human IgG1 Fc. The extracellular domain (25–167) of human PD-1 (UniProtKB Q15116) was fused to a C-terminal AviTag and His tag. Plasmid DNA was prepared from glycerol stocks (Twist Biosciences) using Cowin Biosciences GoldVac EndoFree plasmid maxi kit. Plasmids were transfected into 3 ml or 50 ml cultures of Expi293F (Gibco) cells as per the manufacturer’s recommendations. Cells incubated at 37 °C for 4–5 days before collection. Following protein expression, the cell culture supernatant was filtered through a 0.22-µM filter and purified using MabSelect protein A affinity chromatography resin (Cytiva). The column was washed with PBS and the protein was eluted in Tris glycine buffer pH 2.5. Following elution, proteins were dialysed into PBS using a 10-kDa molecular weight cut-off dialysis cassette. For production of biotinylated PD-1 protein, the PD-1 plasmid was cotransfected with BirA plasmid (2:1 ratio). The BirA plasmid contains the BirA sequence (UniProtKB P06709) with a C-terminal Flag tag in the pcDNA3.4 vector.

Binding characterization of PD-1

Designs were initially screened for binding to biotinylated human PD-1 or a random protein using BLI (Sartorius OctetRED384). Biotinylated human PD-1 protein and biotinylated lysozyme (GeneTex) were prepared at 500 nM in PBS containing 0.1% bovine serum albumin (BSA) (PBSA). The designs were diluted to 5 µM in PBSA. Streptavidin-labelled biosensors were saturated with either biotinylated human PD-1 or biotinylated chicken lysozyme. The designs were then allowed to associate with the immobilized ligand for 60 s, followed by a dissociation step in PBSA. The baseline subtracted signal (nanometres) was calculated and used to prioritize human PD-1 specific binders for further characterization.

To determine the affinity of selected designs, 100 nM biotinylated human PD-1 prepared in PBSA was immobilized onto a streptavidin-labelled biosensor for 15 s. Serial dilutions of the designs (from 2.5 µM to 5 nM) were then allowed to associate with the immobilized ligand for 180 s, followed by a dissociation step in PBSA for 300 s. Following background subtraction of the BLI binding curves using the buffer only (PBSA) curve, the Kd was determined using the 1:1 model in the Data Analysis HT v.11.1 curve fitting module.

To determine whether the designed protein competed with pembrolizumab for binding to PD-1, 100 nM biotinylated human PD-1 in PBSA was immobilized onto streptavidin coated biosensors for 15 s. An initial association with 200 nM pembrolizumab prepared in PBSA was performed for 180 s, followed by a second association with 200 nM design prepared in PBSA for 180 s.

SPR binding and competition assays

SPR measurements were performed using the Biacore 8 K system (Cytiva) in HBS-EP + buffer (10 mM HEPES pH 7.4, 150 mM NaCl, 3 mM EDTA, 0.005% (v/v) Surfactant P20 GE Healthcare). Target proteins were immobilized on a CM5 chip (GE Healthcare) through amide coupling in 10 mM NaOAc pH 4.5 for 130–250 s at a flow rate of 10 µl min−1 aiming for 100 relative response units. Designed binders or control proteins were injected as analytes in either a single 10 µM concentration during binder prescreening or in serial dilutions to assess binding kinetics. These were injected at a flow rate of 30 µl min−1 for a varying contact time, followed by dissociation. If necessary, the chip surface was regenerated after each injection using 10 mM Glycine-HCl pH 2.5 for 30 s at a flow rate of 30 µl min−1. Binding curves were fitted with a 1:1 Langmuir binding model in the Biacore 8K analysis software. Steady-state response units were plotted against analyte concentration and a sigmoid function was fitted to the experimental data in Python v.3.9 to derive the Kd.

Competition assays were performed as follows. For PD-L1 and IFNAR2, target receptors were immobilized, and binders and competitors were injected as analytes. Two subsequent injections were performed with only competitor (A,1 µM), only design (B,1 µM) or first competitor (1 µM, A) and then design + competitor (both 1 µM, A + B). For Bet v1, REGN5713 (Antibody format) was immobilized on the SPR chip and in a first injection (1) loaded with Bet v1 allergen (1 µM), before either REGN5714 (Fab format) or Birch–binder2 were injected (both 1 µM) (2).

Cell-surface specificity measurements

For specificity measurements, PD-1–b4 was expressed and purified as a His-tagged protein. PD-1-Fc was produced with mutations at glycosylation sites (N → D) and free cysteine residues (C → S). All other proteins were purified as previously described.

BLI experiments were performed using a Gator BLI system and GatorOne software (Gator Bio, v.2.7.3.0728). Assays were conducted in a running buffer containing 10 mM HEPES (pH 7.4), 150 mM NaCl, 3 mM EDTA and 0.005% (v/v) Surfactant P20 (GE Healthcare).

For immobilization, Fc-tagged target proteins (PD-L1, PD-1 and IFNAR2) were diluted to 5 µg ml−1 and captured onto protein A biosensor tips (Gator Bio). After immobilization, the biosensor tips were dipped into 1 µM solutions of purified binder.

Protein crystallization and structure determination

The BBF-14–binder4 complex was crystallized at a concentration of 5 mg ml−1 using sitting drop vapour diffusion at 16 °C in 0.1 M MES pH 6.0, 0.2 M sodium acetate trihydrate, 20% w/v polyethylene glycol (PEG) 8000 buffer (SG1-Eco Screen, Molecular Dimensions). The Der f7–binder2 complex in P21 crystal form was crystallized at a concentration of 15 mg ml−1 using sitting drop vapour diffusion at 16 °C in 0.1 M MES pH 6.5, 0.2 M KSCN, 25% w/v PEG 2000 MME buffer (Clear Strategy Screen I, Molecular Dimensions). The Der f7–binder2 complex in C121 crystal form was crystallized at a concentration of 15 mg ml−1 using sitting drop vapour diffusion at 16 °C in 0.1 M MES pH 6.5 and 20% v/v PEG smear high BCS (BCS Screen, Molecular Dimensions). The Der f21–binder10 complex was crystallized at a concentration of 30 mg ml−1 using sitting drop vapour diffusion at 16 °C in 0.1 M sodium citrate pH 5.6, 1.0 M LiSO4, 0.5 M NH4SO4 buffer (SG1-Eco Screen, Molecular Dimensions). Crystals were cryoprotected in 25% glycerol and flash-cooled in liquid nitrogen. Diffraction data were collected at the European Synchrotron Radiation Facility MASSIF-3 and ID30B beamlines, Grenoble, France at a temperature of 100 K. Crystallographic data were processed using the autoPROC package57. Phases were obtained by molecular replacement using Phaser58. Atomic model refinement was completed using COOT59 and Phenix.refine58. The quality of refined models was assessed using MolProbity60. Structural figures were generated using ChimeraX61.

Cryo-EM structure determination

SpCas9 was mixed with a threefold excess of either binder3 or binder10, and the complex was purified using S200 10/300 gel filtration column (GE Healthcare) in 20 mM Tris-HC pH 7.5, 250 mM KCl. The purified complex was applied to a glow discharged 300-mesh holey carbon grid 300-mesh holey carbon grid (Au 1.2/1.3 QuantifoilMicro Tools), blotted for 4 s at 95% humidity, 10 °C, plunge frozen in liquid ethane (Vitrobot Mark IV, FEI) and stored in liquid nitrogen. Data collection was performed on a 300 kV Titan Krios G4 microscope equipped with a FEI Falcon IV detector and SelectrisX energy filter. Micrographs were recorded at a magnification of ×165,000, pixel size of 0.726 Å and a nominal defocus ranging from −0.8 mm to −2.2 mm.

Acquired cryo-EM data were processed using cryoSPARC v.4.6.0 (ref. 62). Micrographs were patch motion corrected, and micrographs with a resolution estimation worse than 5 Å were discarded after patch contrast transfer function estimation. Initial particles were picked using blob picker at 90–135 Å. Particles were extracted with a box size of 360 × 360 pixels, downsampled to 220 × 220 pixels. After two-dimensional classification, clean particles were used for ab initio three-dimensional (3D) reconstruction and initial non-uniform 3D reconstruction63. This model was used for extra template-based picking of particles. Following several rounds of 3D classification, in which classes containing unbound Cas9 were excluded, the class with the most detailed binder features was re-extracted using full box size and subjected to non-uniform and local refinement to generate final reconstructions. The local resolution was calculated and visualized using ChimeraX61. The in silico models were docked into density using ChimeraX61.

Birch allergen blocking assay

Anti-Bet v1 binder blocking capacity was assessed by first coating NuncSorp (Thermo Fisher) plates with 2 μg ml−1 of anti-human IgE monoclonal antibody (NBS-C BioScience; clone Le27; 0908-1-010) in coating buffer (15 mM Na2CO3, 34.87 mM NaHCO3) and incubating overnight at 4 °C. The plates were washed with PBS + 0.05% Tween and blocked using PBS + 1% BSA for 2 h at room temperature. Then, sera from patients allergic to birch were added at a concentration of 4 ng ml−1 of anti-Bet v1 IgE. Biotinylated Bet v1 allergen at 1 nM concentration was preincubated for 2 h at room temperature with fourfold serial dilutions of the Bet v1–binder2 starting at 2 μM or with fivefold serial dilutions of the cocktail of REGN5713, REGN5714 and REGN5715 (starting at 50 nM each) and then added to the IgE coated plate. After 2 h of incubation at room temperature, the plates were washed with PBS + 0.05% Tween and streptavidin horseradish peroxidase (BD Pharmigen; 554066; 1:1,000 dilution) was added and incubated for 1 h. Plates were washed and tetramethylbenzidine substrate (BD Biosciences; 555214) was added and incubated for another 20 min. The reaction was stopped with 2 M sulfuric acid. Absorbance was measured on a spectrophotometer at 450 nm with a 630-nm reference, and blocking percentage was measured by subtracting the absorbance of the sample in the absence of the binder.

MST

CLDN1 WT was labelled with Cy5 by adding a 1:5 molar excess of dye and incubating for 2 h on ice. The excess dye was removed by passing through a PD-10 column. The labelled protein was collected and stored in small aliquots at −80 °C after flash freezing in liquid nitrogen.

For MST-based interaction studies, the Monolith (Nanotemper) instrument was used. Serial dilutions of the ligand (CLDN1–b12/CpE–Nd33) were made in buffer B (25 mM HEPES pH 8.0, 200 mM NaCl, 5% glycerol 0.03% DDM) and mixed with 10 nM labelled CLDN1 WT. After 10 min of incubation, samples were transferred to capillaries (Monolith standard capillary) and readings were initiated. The spectral shift data were plotted and fitted into a Kd model, and estimated Kds were obtained. When data were not fitted using the Kd model, the Hill model was used to fit data. For studying the competitive binding of CpE–Nd33 and CLDN1–b12 to the target CLDN1 WT, a second set of experiments was performed. CLDN1 WT was incubated with CLDN1–b12 (2 × Kd) and subsequently challenged with CpE–Nd33.

Cytotoxicity assay

To study whether claudin binders were able to inhibit pore formation in Sf9 cells expressing claudins, adherent Sf9 cells in a 24-well plate were infected with baculovirus containing either CLDN1 or CLDN4. The assay was performed as shown previously20. Briefly, for each claudin, a 12-well experiment was performed. Six wells were used to test the effect of binders on the pore-forming capacity of CpE–Nd33 and the other six wells were used as controls. After 36 h of infection, 4 µM of each binder were added into six different wells and the plate was then gently mixed by swirling and incubated for 5 min. After that, 300 nM of CpE–Nd33 was added to each of the six wells. The following controls were used in experiment 1. Sf9 without baculovirus infection, 2. Sf9 infected with claudin but not treated with CpE–Nd33, 3. Sf9 infected with Claudin and treated with CpE–Nd33 4. Sf9 infected with Claudin and treated with COP4 Fab (referred to as CpE inhibitor) 5. Sf9 infected with Claudin and treated by COP4 followed by addition of CpE–Nd33 after incubation for 5 min. The number of cells dead or alive were then measured after 18 h of incubation by staining the cells with trypan blue and measuring the number of cells using an automated cell counter (Invitrogen Countess).

SpCas9 gene editing

For SpCas9-single-guide RNA (sgRNA) plasmid cloning, lentiCRISPR v2 (Addgene no. 52961, a gift from F. Zhang) was digested with BsmBI (NEB). Oligonucleotides encoding for the sgRNA targeting the NSD2 gene were annealed and ligated into the digested lentiCRISPR v2 plasmid. All binders were human-codon optimized using the GenSmart Codon Optimization tool and ordered as inserts with homology overhangs for cloning from Twist bioscience. Final binder plasmids were generated by isothermal assembly (NEBuilder HiFi DNA Assembly Cloning Kit, NEB).

HEK293T (ATCC CRL-3216) cells were maintained in DMEM plus GlutaMax (Thermo Fisher Scientific), supplemented with 10% (vol/vol) fetal bovine serum (Sigma-Aldrich) and 1 × penicillin-streptomycin (Thermo Fisher Scientific) at 37 °C and 5% CO2. Cells were maintained at confluency below 90% and passaged every 2–3 days. For testing inhibitor efficiency, HEK293T cells were seeded in 48-well cell culture plates (Greiner) and transfected at 70% confluency using 300 ng Cas9 + sgRNA plasmid, 500 ng of inhibitor plasmid and 5 µl of Lipofectamine 2000 according to the manufacturer’s instructions (Thermo Fisher Scientific). The next day, cells were split and selected with Puromycin, Blasticidin or both. Three days posttransfection, cells were gathered and genomic DNA was isolated by direct lysis.

The DNA from the cell lysate was prepped for next-generation sequencing as previously described64. In the first PCR round, genomic regions of interest were amplified using GoTaq Green Master Mix (Promega) and primers that included Illumina forward and reverse adaptor sequences. A second PCR round, also using GoTaq Green Master Mix (Promega), introduced p5–p7 barcodes into the products from the first round. The resulting amplified amplicons were pooled and quantified using a Qubit 3.0 fluorometer (Invitrogen). The libraries were then sequenced using a MiSeq platform (Illumina, 150 bp, paired-end). Sequencing data and resulting gene editing insertion-deletion rates were analysed using CRISPResso2 (ref. 65).

CbAgo in vitro cleavage assay

For in vitro cleavage assays, binders, CbAgo, 5′-phosphorylated 16-nt single-stranded DNA (ssDNA) guide (oDS423) and Cy5-labelled 45-nt ssDNA target (oDS401) were mixed to final concentrations of 2:0.4:0.4:0.2 μM in 10 mM HEPES pH 7.5, 125 mM KCl and 2 mM MgCl2. To this end, first the binder protein and CbAgo were mixed and incubated at 37 °C for 15 min, after which the mixture was incubated on ice and guide ssDNA and target ssDNA were added. Subsequently, reaction mixtures were incubated at 37 °C, and samples were taken at 0-min, 4-min, 10-min, 30-min and 60-min timepoints. Samples taken at each timepoint were directly quenched by adding 2× RNA loading dye (25 mM EDTA, 5% v/v glycerol, 90% v/v formamide) and heating for 5 min at 95 °C. Cleavage products were resolved using denaturing (7 M urea) 20% polyacrylamide gel electrophoresis, and gels were imaged using a Amersham Typhoon gel scanner (Cytiva Life Sciences). Cleavage reactions were performed in triplicates for each binder protein. CbAgo target cleavage was quantified using ImageQuant TL 1D v.8.2.0 (Cytiva Life Sciences), and fitted with nonlinear least squares fit (nlsLM from R package minpack.lm) to a double-exponential decay model to model initial (fast) and turnover (slow) cleavage:

$${\rm{cleavage}}=A\left(1-\exp \left(-\frac{{\rm{time}}}{{K}_{1}}\right)\right)+B\left(1-\exp \left(-\frac{{\rm{time}}}{{K}_{2}}\right)\right)$$

If fitting to a double-exponential decay model yielded no fit after 1,024 iterations with residuals and gradient convergence tolerance of 1 × 10−9, the turnover cleavage (slow) was considered negligible and a single-exponential decay model (that is, B = 0) was used.

$${\rm{cleavage}}=A\left(1-\exp \left(-\frac{{\rm{time}}}{{K}_{1}}\right)\right)$$

For all samples Kcat was calculated from the fit constants for the initial rate (A and K1):

$${K}_{{\rm{cat}}}=\frac{A\times [{\rm{target}}]}{60\times {K}_{1}\times [{\rm{CbAgo}}]}$$

The mean and standard deviation of Kcat was calculated using the three experimental replicates.

CbAgo BLI binding kinetics

BLI measurements were conducted using the Gator BLI system and GatorOne software (Gator Bio, v.2.7.3.0728). The running buffer consisted of 150 mM KCl, 20 mM HEPES (pH 7.5) and 0.5% BSA. His-tagged CbAgo binders were immobilized on the sensor tips at a concentration of 10 µg ml−1. After immobilization, the tips were transferred into serial dilutions of CbAgo. Binding curves were globally fitted using a 1:1 interaction model in the Gator software.

CbAgo SEC binding verification

Purified CbAgo was diluted to 0.8 mg ml−1 (9.3 µM) and mixed with 0.2 mg ml−1 binder protein and 9.3 µM 5′-phosphorylated 16-nt ssDNA guide (oDS423) in SEC buffer (20 mM HEPES pH 7.5, 250 mM KCl and 2 mM MgCl2). The mixture was incubated for 15 min at room temperature. After incubation samples were resolved at room temperature on a Superdex 200 Increase 10/300 GL column (Cytiva Life Sciences) connected to a 1260 Infinity II high-performance liquid chromatography system (Agilent) using SEC buffer with a flow rate of 0.75 ml min−1. The elution was measured using a Agilent 1260 Infinity II Multiple Wavelength Detector at 280 nm. The data were analysed using Astra v.8.1 (Wyatt Technology).

AAV engineering

HEK293 cells adapted to culture in orbitally shaken bioreactors (HEKExpress, ExcellGene SA) were maintained in Serum-free BalanCD HEK293 Medium (Irvine Scientific) supplemented with l-alanyl-l-glutamine (Gibco GlutaMax) at 37 °C, 80% humidity, 5% CO2, under constant shaking at 180 rpm (shaking diameter 5 cm). Cells were passaged every 3–4 days to a concentration of 0.2 × 106 cells per ml. For the generation of cell lines stably expressing the target receptors used in the AAV transduction experiments, the receptor complementary DNAs (cDNAs) were obtained from an open reading frame collection (HER2, ORFeome Collaboration cDNA Clone, PD-L1, Addgene no. 121142) and cloned into a pRRLSIN lentiviral shuttle construct (Addgene no. 12252) with expression under the control of the human phosphoglycerate kinase (hPGK) promoter. Lentiviral particles were generated using standard procedures for calcium phosphate transfection of HEK293T cells with the pRRLSIN-hPGK-WPRE, p8.92, pMD2G and pAdVAntage plasmids. At 48 h, the vector-containing supernatant was harvested, filtered and concentrated by ultracentrifugation. The number of lentiviral particles present in the obtained vector suspension was quantified using a p24 antigen ELISA kit (ZeptoMetrix). HEKExpress cells were transduced in a six-well plate at a density of 3.0 × 106 cells per well using a multiplicity of infection (MOI) of 100 vg per cell (conversion factor 1 pg p24 = 1 × 104 vg). After 5 days, the cells were stained for the presence of the respective target receptor using an APC-conjugated antibody (0.8 µg ml−1, BioLegend, 329707 (PD-L1), 324407 (HER2)) in staining buffer (PBS containing 0.5% BSA (Merck)) and sorted by flow cytometry using a Sony SH800 cell sorter. After expansion, the cells were aliquoted and frozen at −80 °C until further use.

The pRepCap plasmids for the AAV production by transient transfection of HEKExpress cells, encoding the rep (AAV2) and cap (varying) genes, were chosen according to the different variants as indicated. For serotype 6 wild-type AAV, an AAV6 plasmid was ordered from the manufacturer (Aldevron, pALD-AAV6). For the variant carrying the knockout mutations to deplete the primary interactions with heparin (K459S and K531E) and sialic acid (V473D, N500E and T502S), a corresponding gene fragment was ordered as an insert with homology overhangs (Twist Biosciences) and cloned into pALD-AAV6 by BspEI/MscI yielding the pRepCap knockout. To introduce the sequence encoding the designed miniprotein binders, an intermediate plasmid was created that, in addition to the five mutations to deplete the primary interactions, carries two silent mutations yielding MluI/NheI restriction sites in proximity to the chosen site of binder insertion between amino acid positions 497 and 498 of the serotype 6 VP3. The DNA sequences for the designed miniprotein binders, flanked by a single –(GSG)1– linker at both termini, were human-codon optimized using the GenSmart Codon Optimization tool and ordered as inserts with homology overhangs (Twist Biosciences) for subcloning.

For the AAV production for screening at small scale, the cells were seeded in 24-well cell culture plates at a density of 0.4 × 106 cells per ml in a volume of 500 µl and transfected with 520 ng of pHelper (Aldevron, pALD-HELP), 250 ng of shuttle plasmid (Aldevron, pALD-ITR-GFP), 270 ng of pRepCap (varying) and 1.5 µg of polyethyleneimine (Polysciences). If applicable, the variants’ pRepCap plasmids were respectively mixed in a ratio of 1:2 with the pRepCap knockout plasmid. Then 12 h after transfection, the cells’ media was exchanged and supplemented with 4 mM valproic acid (Sigma). The cell culture was incubated at the standard conditions described above but without shaking, and the AAV containing medium was harvested on day 5 by collecting the supernatant using centrifugation at 400g for 5 min at room temperature to remove cells.

For the AAV production for validation at a normalized MOI, the cells were seeded at a density of 1.0 × 106 cells per ml in a volume of 300 ml in a TubeSpin 600 bioreactor tube (TPP) and transfected with 231 µg pHelper (Aldevron, pALD-HELP), 105 µg of shuttle plasmid (Aldevron, pALD-ITR-GFP), 105 ng pRepCap (varying) and 900 µg polyethyleneimine (Polysciences). If applicable, the pRepCap plasmids were obtained as described above, and mixed in a ratio of 1:2 with the pRepCap knockout plasmid. Then 6 h after transfection, the cell culture medium was supplemented with 4 mM valproic acid (Sigma). The cell culture was incubated at the standard conditions described above for 7 days, and the vector was harvested on day 3–4 and on day 7 by collecting the supernatant after centrifugation of the bioreactor tube at 800g for 10 min at room temperature. The supernatant was filtered (Stericup Quick Release, Millipore Express PLUS 0.22 μm PES, 1,000 ml, Merck Millipore). The particles were concentrated from the cell culture supernatant to a concentration of at least 3.0 × 1010 vg ml−1 by using Amicon Ultra-15 centrifugal filter units at a molecular weight cut off of 100 kDa (Merck). The wild-type and knockout variant particles were processed alternatively according to Gaudry et al.66. In short, the particles were purified from the cell culture supernatant using the POROS CaptureSelect AAVX resin (Thermo Fisher Scientific) on an ÄKTA Pure chromatography system followed by buffer exchange to PBS, 0.001% Pluronic F-68 (10% stock solution, Gibco) through Amicon Ultra-15 centrifugal filter units, at a molecular weight cut off of 100 kDa (Merck). The number of genome-containing AAV particles was determined after treatment with DNase I (Thermo Fisher) by digital PCR using the QIAcuity system and PCR kit (Qiagen).

For transduction, the target cells were seeded in 96-well cell culture plates at a density of 3 × 105 cells per ml in a volume of 100 µl. After 6 h, the cells’ media was replaced with 100 µl of AAV containing medium from the production in 24-well cell culture plates, or a 100-µl dilution to 3 × 1010 vg ml−1 of the material from the production in 300 ml of culture to yield a MOI of 1 × 105 vg per cell. If applicable, 0.8 µg ml−1 target receptor blocking antibody (BioLegend, 329707) was added. After 48 h, the cells were washed twice with 100 µl of PBS containing 0.5% BSA (Merck), and the transduction signal (GFP) was measured by flow cytometry on an Attune NxT analyser (Thermo Fisher) equipped with an automated plate reader. The results were analysed using FlowJo v.10.8 Software (BD Life Sciences).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.