Introduction

The application of deep learning (DL) in the field of molecular generation-based de novo drug design (DNDD) has emerged as a prominent interdisciplinary focus. Over the past few years, nearly 200 novel methods have been reported1,2,3,4,5, and many of these approaches have successfully designed compounds with demonstrated biological activity. In 2019, Aspuru-Guzik et al. designed a number of novel DDR1 tyrosine kinase inhibitors using a DL model based on reinforcement learning, and remarkably, this study enabled the design, synthesis, and activity evaluation within just 46 days6. In 2021, IBM successfully synthesized and experimentally validated 20 polypeptide biomolecules designed by deep generative autoencoders within 48 days7. In the same year, Schneider et al. innovatively combined LSTM-based sequence generation models with microfluidic automated synthesis and testing platforms8. The project synthesized 25 organic molecules from scratch, with 12 of these being identified as potential agonists for the LXR target in vitro8. Collectively, these advancements underscore significant progress in AI-based drug design technologies and the driven automation of experimental laboratories, pointing to a promising future in this field. In 2022, Godinez et al. developed a generative modeling approach based on variation autoencoder (VAE)9. By utilizing this method, they synthesized and experimentally profiled two compounds, and both compounds demonstrated low nanomolar efficacy in a malaria proliferation assay and a biochemical assay targeting PI4K. In the same year, Li et al. developed a generative model based on conditional RNN that successfully designed a selective RIPK1 inhibitor featuring a novel scaffold10, and this inhibitor exhibits potent in vitro activity in protecting cells from necroptosis, along with notable in vivo efficacy in two inflammatory models.

DL-based DNDD has shown promising performance in a variety of pharmaceutical design tasks, including multi-objective optimization11,12,13,14, fragment-based drug design15,16, and the generation of binding conformations based on protein pockets17,18,19,20. Similar to traditional drug design, these approaches can be categorized into two mainstreams: ligand-based and receptor-based approaches. In ligand-based methods, two-dimensional (2D) representations such as SMILES21,22,23 or molecular graphs22,24,25 are commonly employed as the inputs and outputs for DL frameworks. These methods utilize predictions from physicochemical property predictors like logP, synthesizability coefficients, and QSAR activity6,11 to guide the generation of ligands. Representative methods in this ligand-centric generation category include ChemVAE26,27, REINVENT12, and GENTRL6. Although these ligand-centric approaches yield valid and novel molecules, they often overlook the three-dimensional (3D) conformations of generated molecules. To address the limitations of ligand-centric generation methods, receptor-based approaches that utilize 3D molecular representations have been developed. Examples of such methods include RELATION28, LIGAN19, ResGen17, and SurfGen18. These approaches explicitly model the inter-molecular interactions between proteins and ligands during the generative process, learning to produce more favorable molecules conforming to the simple physical principles that are captured by the 3D structural training data.

As summarized above, DL-based molecular generations have been making impressive advancements, both in terms of the quality of generated molecules and the multitude of different factors that need to be simultaneously taken into account while designing a real-world relevant drug candidate. However, despite the amazing progress made on multiple fronts, a major limitation of contemporary generative models is the synthesizability of the generated molecules. This practical challenge has severely restricted the endorsement of many AI-designed molecules by organic and medicinal chemists. In short, synthesizability has been a major reason on why we have not yet seen more DL-proposed drug candidates with sufficiently novel scaffolds being effectively translated into the real-world applications6,7,8,9.

To enhance the synthesizability of generated molecules, many fragment-based strategies have been proposed, as discussed earlier. Particularly, some researchers have endeavored to integrate combinatorial chemistry techniques from traditional drug design into DL-based generative models, such as BBAR29, Synnet30, and DeepLigBuilder+31. These models operate by assembling building blocks (synthons) together to compose novel molecules via predefined reaction rules, and some of them can even propose plausible reaction pathways in their output. While these approaches possibly provide convincing demonstration that generated molecules possess high synthesizability, similar to traditional combinatorial chemistry techniques, they suffers from a few notable limitations32:

  1. (1)

    Difficulty in actually synthesizing generated molecules33,34: While these combinatorial-based generative methods suggest how to piece together a molecule via reaction rules, they often do not account for factors such as side reactions, stringent reaction conditions, additional activation steps and steric hindrance. Hence, the actual synthesis in a wet lab is often hindered by the scarcity of raw materials and harsh reaction conditions, necessitating extensive amount of time and resources.

  2. (2)

    Uncertainty in the bioactivity of the combined molecules: These reaction-based generative models primarily ensure high synthetic accessibility in the combined molecules, but they often do not guarantee the biological activity of the resultant molecules. Despite suggesting structures with potential pharmacophores, promising docking conformations, and favorable results in free energy perturbation (FEP) analyses29,35,36, the practicality of these molecules as potential drug leads remains speculative without wet-lab experimental validation30.

The core of synthesis-oriented molecular generative models lies in the set of predefined chemical reaction rules, which largely determine whether the combinatorially generated molecules may actually be synthesized in wet lab33,34,37. The Copper-catalyzed azide-alkyne cycloaddition (CuAAC)38,39 reaction occurs at room to moderate temperatures (25 °C to 60 °C) using copper(I) catalysts such as CuBr or CuI. Alternatively, copper(I) can be generated in situ from copper(II) salts (e.g., CuSO4·5H2O) in the presence of a reducing agent (e.g., ascorbic acid)40,41. The reaction is carried out in polar solvents such as water, ethanol, DMSO, or THF, involving the mixing of azides and alkynes, followed by the addition of ligands (e.g., triphenylphosphine or phenanthroline) to stabilize the copper(I) catalyst. The CuAAC reaction is characterized by rapid and highly selective conversion, typically completing within minutes to hours, and does not require stringent reaction conditions or complex purification steps40,41,42. Known for its standardized reaction conditions, minimal side reactions, and high yields, CuAAC represents a modular and ideal reaction to be used along with generative models. Its efficiency and simplicity have led to widespread application in drug development, materials science, and biolabeling, with excellent reproducibility. Furthermore, some researches43,44,45,46 indicates that CuAAC-based reaction rules can facilitate the construction of compound databases for virtual screening, encompassing billions of compounds, with the assurance that up to 80% of these compounds can be synthesized31. Therefore, CuAAC, as the reaction rule for synthesis-oriented generative models, possesses the capability to generate novel and diverse compounds, while maximizing the practical feasibility of synthesis.

Drawing inspiration from the construction of virtual libraries, as as the REAL47 database, which comprises billions of molecules, ClickGen adopts click chemistry as the foundational reaction rule, complemented by the modularizable amide reaction. This amide reaction between carboxylic acids and amines, especially when DCC (N,N’-Dicyclohexylcarbodiimide) or EDC (1-Ethyl-3-(3-dimethylaminopropyl)carbodiimide) as coupling agents, is well-known for its high efficiency and ease of reproducibility. The reaction typically occurs under mild conditions using polar solvents such as dichloromethane or DMF (N,N-Dimethylformamide). The DCC/EDC method effectively activates carboxylic acids, enabling their rapid reaction with amines to form amide bonds, often within minutes to hours48,49,50. Due to its simplicity and robustness, this method is preferred in various fields, including drug synthesis51, biomolecular labeling52,53, and materials science54,55, ensuring consistent and reliable results.

Combining modular reactions and utilizing the inpainting model and reinforcement learning for molecule generation, our objective is to create novel, synthetically accessible, and biologically active molecules. To evaluate ClickGen’s performance for DNDD tasks, we first consider three different types of therapeutic targets: ROCK1, characterized by a relatively simple pocket structure; SARS-Cov2 Mpro (hereafter referred to as SARS-Cov2 or Mpro), recognized for its more complex pocket structure; and AA2AR, which differs from the first two targets as its active molecules are antagonists. The performance of the ClickGen model on these three test targets, which had numerous active ligands available to assist in assessing model performance, has further strengthened our confidence in applying it to real-world drug design for the PARP1 target, involving virtual screening, synthesis, and bioactivity assessment of the generated molecules in wet lab experiments, ultimately evaluating the feasibility of ClickGen in practical drug discovery tasks.

Results

As depicted in Fig. 1a, the Clickgen model is composed of three main components: a chemical reaction combinator, an inpainting generative model, and a reinforcement learning module based on MCTS (Monte Carlo Tree Search). ClickGen utilizes the customized synthons and employs reinforcement learning to guide directed molecule generation based on the properties of protein pockets as encapsulated in the docking scores. The incorporation of inpainting technology enables ClickGen to generate more novel and diverse set of molecules, and it works by replacing masked synthons of the parent core with some valid (and potentially novel) synthons that may contribute to the binding. Following the combination of synthons according to the reaction rules, ClickGen can, under the guidance of reinforcement learning, perform numerous regenerations at parent cores, resulting in markedly enhanced novelty and diversity. The harmonious interplay of inpainting technique and reinforcement learning is key to the success of ClickGen, satisfactorily addressing the conflicts between synthesizability and novelty that other synthesis-oriented generative models encounter.

Fig. 1: Overview of clickgen and case study based on protein pocket generation.
figure 1

a The complete ClickGen model after incorporating reinforcement learning (RL), where the prior model encompasses both the reaction-based combinator and the inpainting generative model; b Overview of the selection, expansion, simulation, and backpropagation processes in Monte Carlo Tree Search (MCTS), where the yellow- brown-yellow connecting lines represent the Copper-Catalyzed Azide-Alkyne Cycloaddition (CuAAC) reaction and the black brown-yellow connecting lines represent the amidation reaction; c A complete example of ClickGen generating molecules within the SARS-Cov2 Mpro pocket, and the generation process of this compound is guided by MCTS, which selects the highest-scoring reaction pathway through the stages of expansion, simulation, and backpropagation. comprising a total of four synthons, with the reinforcement learning algorithm guiding the steps of the chemical-reaction combinator and inpainting generator during the generation process based on the pocket’s properties.

Performance of the chemical reaction combiner and inpainting generator

Due to the necessity of independently training both the chemical reaction combiner and the inpainting generator, it is crucial to separately evaluate the effectiveness of each model. To evaluate the performance of the chemical reaction combiner and the inpainting generator, both models were utilized independently for molecular generation. The former primarily focused on assessing the accuracy of molecular combinations and the equitable utilization of synthons, while the latter evaluated the rationality of the completed molecular fragments and the overall molecule.

The Chemical Reaction Combiner model is trained using the REAL dataset. After training, the model is capable of combining synthons based on the easily synthesizable characteristics defined by the REAL dataset. During the training process, positive and negative samples (synthons) are provided as options to be combined with the input synthons, and a loss function is constructed to enhance the distinction between positive and negative choices. The ratio of positive and negative synthons in this negative sampling process significantly impacts the model’s accuracy29,56,57,58. To effectively train the chemical reaction combiner, different ratios of these synthons were compared to ascertain the optimal training outcome. As shown in Table S1, the validity of generated molecules reached its maximum when the ratio of positive to negative synthons was tenfold. This suggests that, at this ratio, molecules with exposed synthons endpoints, which are deemed invalid, are not generated. Moreover, during the generation process, certain negative reactions, such as the condensation of carbonyl or the condensation of amine, are avoided. The analysis of the reaction distributions in the generated molecules, is shown in Table S1, indicating that a balanced utilization of the four reaction types was achieved only when the ratio of positive to negative synthons exceeded ten. In cases where the ratio was less than ten, the molecules leaned towards the utilization of reaction 1 and reaction 3, leading to more monotonous molecular structures. Additionally, it was observed that when the number of negative synthons was increased beyond a tenfold ratio or when the number of positive and negative samples was augmented proportionally, there was no improvement in reaction accuracy or equitable reaction types utilization. Considering both model performance and training efficiency, the optimal numbers of correct and incorrect synthons for training the chemical reaction combiner were determined to be 100 and 1,000, respectively.

{\({lmol}\),\({mmol},{rmol}\)} represent the three parts (left part of molecule, middle part of molecule, and right part of molecule) of a molecule input during training, and {\(l^{\prime} {mol}\),\(m^{\prime} {mol},r^{\prime} {mol}\)} are the corresponding outputs from the model. Upon training, the inpainting generator can generate the central molecular fragment, \(m{\prime} {mol}\), based on end fragments, \({lmol}\) and \({rmol}\), to assemble novel molecules. Under the guidance of \({{{{\mathscr{L}}}}}_{{recon}}\) and \({{{{\mathscr{L}}}}}_{{consis}}\) loss functions mentioned in Section Inpainting-based generative model, molecules generated by the inpainting generator should exhibit the following features:

  1. (i)

    Structural integrity and feasibility of the overall molecule;

  2. (ii)

    Physicochemical properties that align with the training distribution;

  3. (iii)

    The \(m^{\prime} {mol}\) should retain reactive chemical bonds (e.g., amide bonds, triazole rings) while promoting structural novelty;

  4. (iv)

    Both the \(l^{\prime} {mol}\) and \(r^{\prime} {mol}\) ends should stay consistently with \({lmol}\) and \({rmol}\), respectively.

To this end, we evaluated the generated \(m{\prime} {mol}\) and the composite molecules {\(l^{\prime} {mol}\),\(m^{\prime} {mol},r^{\prime} {mol}\)}, with the results summarized in Table S2 and Fig. 2.

Fig. 2: The kernel density estimation (KDE) distributions of the generated molecular set (n = 10,000) and REAL dataset.
figure 2

ah The x-axis of different subgraph represent different physicochemical properties, with the blue curve indicating the distribution of the REAL dataset (REAL) and the orange curve representing the distribution of the molecular set generated by the inpainting generator (Gen. Mol). M-mol and M’-mol in (a) refer to the intermediate fragment molecules from the REAL dataset and the intermediate molecules generated by the inpainting generator, respectively. The y-axis represents the values of the Kernel Density Estimate (KDE), indicating the density or frequency of the x-axis values at the corresponding positions. logP octanol-water partition coefficient, QED Quantitative Estimation of Drug-likeness.

As shown in Table S2, the inpainting generator effectively generates molecules with high validity and novel structures. Furthermore, during \(m{\prime} {mol}\) generation, reaction chemical bonds, as well as the structures of \({lmol}\) and \({rmol}\), are retained. This retention bodes well for the subsequent steps in ClickGen, reducing potential errors in synthesis-oriented molecular generations. Figure 2 shows that the physicochemical property distributions of the assembled molecules align closely with those of the REAL training set. The \(m{\prime} {mol}\) generation process considers the cohesiveness of its fragments with adjacent \({lmol}\) and \({rmol}\), ensuring that the assembled molecule retains the physicochemical properties of the molecules in the training set. In short, ClickGen can provide structural novelty and physicochemical consistency.

Quality of the molecular set generated by ClickGen

We employed BBAR29 and Synnet30 as the benchmark models to evaluate the performance of two ClickGen models in de novo drug design targeting ROCK1, SARS-Cov2 Mpro and AA2AR. For the training of BBAR, we followed the instructions provided in the code repository mentioned in the article, constructing the dataset of molecule and protein docking scores while setting all other parameters to their default values. For Synnet, we utilized the target evaluation surrogate QASR model28,59,60 following the instructions provided in the code repository for training, with all other parameters set to default. Specifically, ClickGen employs a chemical reaction combiner for molecule generation, while ClickGen-inpainting incorporates the inpainting generator, and the former utilizes some pre-defined synthon sets for combination, whereas the latter enables the combination of novel generated synthons. The structure, scaffold, and physicochemical properties of the molecules generated by different models are summarized in Table 1.

Table 1 Quality of the molecular set (n = 10,000) generated by the chemistry reaction-based generative models

Table 1 illustrates that both ClickGen models produce molecules with high validity, uniqueness, and diversity, surpassing the benchmark models in terms of molecular quality. Regarding the generated molecular scaffolds, both ClickGen models can generate a greater number of scaffolds, consequently reducing homolog structures (compounds that possess the same or similar parent core) production. However, ClickGen-inpainting produces significantly more novel scaffolds. This suggests that the inpainting-based generator, by filling in missing molecular segments, can design more novel structures than the combinator alone. Molecules generated by both models exhibit low FCD values for inhibitors, indicating that the generated molecules potentially share high physicochemical similarities with inhibitors.

As evident from Table 1, both ClickGen models generate molecules with notable novelty. To assess the synthetic feasibility of these novel molecules, we employed four evaluation metrics to analyze their synthesizability. Given that the ROCK1 target pocket is relatively straightforward, and the chemical structures of the active molecules are simple, these models generate molecules with high synthesizability for this target. To provide a more intuitive comparison of molecule synthesizability among different models, we have selected the Mpro target, characterized by a complex pocket structure, for target-specific tasks. Figure 3 presents these results. Specifically, lower values of SC-score61 (Synthetic Complexity score, ranging from 1–5) and GASA62 (Graph Attention-based assessment of Synthetic Accessibility, values of 0 or 1) indicate easier synthetic accessibility of the molecules. Conversely, for RA-score63 (Retrosynthetic Accessibility score, values of 0 or 1) and SYBA64 (SYnthetic Bayesian Accessibility) higher values suggest better synthesizability. As illustrated in Fig. 3, regardless of whether the evaluation was conducted using descriptor-based or graph-based metrics, the molecules generated by both ClickGen models exhibited superior performance across these four metrics compared to the benchmark models, thereby indicating their high synthetic feasibility. In combination with the FCD values in Table 1, it can be observed that the novel scaffold molecules generated by the ClickGen-inpainting model neither compromise synthetic accessibility nor deviate significantly in their physicochemical properties. This indicates that the reinforcement learning effectively considers the overall synthesizability of the compounds, thereby selecting more appropriate synthons.

Fig. 3: Synthesizability assessment of the molecules generated by different models for the SARS-Cov2 Mpro target.
figure 3

ad represent the SC-score (Synthetic Complexity score, ranging from 1 to 5), RA-score (Retrosynthetic Accessibility score, with values of 0 or 1), GASA (Graph Attention-based assessment of Synthetic Accessibility, with values of 0 or 1), and SYBA (Synthetic Yield and Biocompatibility Assessment) scores, respectively. Lower values of SC-score and GASA indicate greater synthetic accessibility of the molecules. Conversely, for the RA-score and SYBA, higher values suggest enhanced synthesizability.

Physicochemical properties of molecules generated by ClickGen

Due to the ClickGen model’s combinatorial approach employing the same reaction rules as the REAL database, the primary concern is whether this novel synthesis-oriented generation can easily extend beyond the chemical space covered by the REAL database. We initiated virtual screening on the ROCK1, SARS-Cov2, and AA2AR targets using the REAL database. After applying a tiered selection strategy involving HTVS, glide-SP, and glide-XP, and retaining the top 10% of compounds in each step, we selected the top 10,000-ranked molecules in the final result as the benchmark molecular sets.

Subsequently, we employed two ClickGen models to generate molecular sets for these three targets separately. We then sequentially screened them in the same manner, ultimately obtaining the top 2,000 ranked generated molecules. To visualize the chemical space of the generated molecules from different methods, we employed T-SNE (T-distributed stochastic neighbor embedding) based on the ECFP6 fingerprints65. The results, as depicted in Fig. 4, demonstrate that both ClickGen models offer a more expansive set of molecules than that of the REAL database. Furthermore, the model incorporating the inpainting generator explores a larger chemical space than the one without it. This suggests that the novel combinatorial approaches used by both ClickGen models can generate more novel molecules and have the potential to circumvent patent protection.

Fig. 4: t-SNE analysis of the chemical space of the benchmark molecule set from the REAL database and the molecules generated by the ClickGen models (n = 2000).
figure 4

a and b represent the ROCK1 target, c and d correspond to the main protease of SARS-CoV-2, and e and f pertain to the AA2AR target. The blue points indicate the distribution of molecules generated by the two ClickGen models, while the orange points represent the distribution from the REAL database.

Subsequently, we analyzed the number of the reaction steps and the molecular weight of the reagents used at each step for the benchmark molecule set and the generated molecules. The results of this analysis are presented in Fig. 5 (SARS-Cov2) and Fig. S1 (ROCK1 and AA2AR). As shown in the upper parts of both figures, the majority of the molecules in the REAL benchmark dataset are synthesized in 1 to 2 reaction steps, whereas the molecules generated by the two ClickGen models require 3 to 4 chemical reaction steps. These multi-step reactions ultimately lead to the generation of molecules with more diverse scaffolds, contributing to a more extensive exploration of the chemical space. In the lower parts of the figures, we observe that the molecular weight of the reagents decreases with the increasing number of reaction steps. Typically, modifications are made to chemical groups and scaffold termini in the third or fourth reaction step, based on molecular weight estimation. The increase in the number of steps does not pose synthesis difficulties, enabling flexible structural modifications based on the shape of the protein pocket.

Fig. 5: Chemical reaction steps required for molecule generation (top, n = 2000) and the corresponding molecular weight distribution of synthons used in each step (bottom, n = 2000) across different molecule sets, targeting the SARS-cov2 protein.
figure 5

a displays the number of combination steps for molecules in the REAL dataset, as well as the number of combination steps for molecules generated by the two ClickGen models. bd represents the molecular weight distribution of reagents used in the combination reactions for the REAL dataset, ClickGen, and ClickGen-inpainting, respectively.

Finally, we conducted a physicochemical property analysis of the molecules generated by the ClickGen model. We used the ROCK1 and SARS-Cov2 inhibitors (mentioned in the Preparation of dataset and building blocks section) as the references to compare the similarity between the generated molecules and inhibitors, and we performed docking score assessments. As shown in Fig. 6a, the generated molecules span the chemical space of inhibitors, with an average docking score of approximately -10 kcal mol-1 for ROCK1 and -9 kcal mol-1 for SARS-Cov2, as indicated by the color gradient. In contrast, as to the BBAR and SynNet models (Fig. S2), the generated molecules exhibit a highly dispersed chemical space to the inhibitors, with score averages hovering around -6 kcal mol-1. This suggests that these generated molecules exhibit stronger performance in terms of docking than the same type of generative model.

Fig. 6: Docking analysis and inhibitor similarity analysis of different generative models (n = 2000).
figure 6

a t-SNE analysis of the chemical space distribution between the generated molecules and corresponding target-active compounds. Orange points represent the active compounds, and the color gradient of blue-green points represents the Vina scores of the generated molecules, with greener colors indicating better scores. Lighter colors (yellow to light green) indicate better scores ranging from −13 to −10 kcal·mol-1, while darker colors (green to dark blue) indicate worse scores ranging from −10 to −7 kcal·mol−1. It is important to note that the scores vary across different targets, and comparisons should be made within the same target for meaningful results. b KDE distribution of Ligand Efficiency (LE) for the molecules generated by different methods across three targets; c Distribution of the Vina scores of the generated molecules and their similarity to inhibitors.

As illustrated in Fig. 6b, the Ligand Efficiency (LE) values for the molecules generated by the four methods range between 0.2 to 0.3 kcal mol-1 per heavy atom. Although the LE values for the two baseline methods are slightly lower than those for the two ClickGen methods, all four fragment-based methods are categorized as highly efficient in molecular design, with no excessive “inefficient” atoms present in these molecules, i.e., atoms that contribute minimally to enhancing molecular binding with the target. In the comparative analysis of similarity and docking scores in Fig. 6c, it can be observed that most high-scoring generated molecules exhibit a high Tanimoto similarity with inhibitors, indicating that they share structural features with active inhibitors.

Feasibility of the ClickGen model in real-world drug design

To assess the feasibility of the ClickGen model in the task of de novo drug design for SARS-Cov2, we compared the structures, docking conformations, and binding conformations of the generated molecules with those of non-covalent inhibitors. Additionally, we examined the synthesizability of the generated molecules. The results are summarized in Fig. 7.

Fig. 7: Docking conformation analysis of the sampling result (n = 2000) of two ClickGen models, and synthetic route prediction for the generated molecules with the most similar conformations to active inhibitors targeting SARS-CoV-2 Mpro.
figure 7

a the first two plots depict the distribution curves of the Root Mean Square Deviation (RMSD) values between the generated and the ideal conformation. The third plot provides the statistics on the proportion of the generated molecules and inhibitors interacting with the key HIS41 and CYS145 residues in the SARS-Cov2 target pocket. b Comparison of the conformations of the molecules (red) generated by the two ClickGen models with the closest conformations of the active inhibitors (green), along with the synthesis routes of the generated molecules. aamidation; bCopper-catalyzed azide-alkyne cycloaddition (CuAAC); cdeprotection; Gen.Mol: Generated molecules.

We first re-docked all the generated molecules with the protein and used the active molecule with the highest Tanimoto similarity as a template. This approach allowed us to obtain the most “ideal” binding conformation of the generated molecule, which was then compared with the generated molecular conformations using RMSD calculations. Subsequently, we utilized these RMSD values to create a distribution curve. In Fig. 7a, the first two plots show that the Root Mean Square Deviation (RMSD) conformational averages of the molecules generated by both ClickGen models with ideal binding conformations are less than 1 Å. This indicates that both models are capable of generating molecules with ideal binding conformations. Moreover, when using the inpainting model, we observe an increased conformational divergence, with the average RMSD shifting from approximately 0.6 Å to 0.9 Å. However, in the third plot, it is evident that ClickGen-inpainting maintains the interactions with the residues HIS41 and CYS145 in the S1 region, However, in the third plot, it is evident that ClickGen-inpainting maintains interactions with the residues HIS41 and CYS145 in the S1 region. These two amino acid residues are crucial for designing SARS-Cov2 inhibitors, a result also observed in ClickGen. Upon close examination of the protein-ligand interaction profiles of these docking molecules (Fig. S3), we find that while the overall interactions with the two amino acids are similar, there are distinct differences in specific interactions. ClickGen-inpainting’s interactions with HIS41 closely resemble those of the inhibitors, whereas ClickGen forms more hydrophobic interactions and fewer hydrogen bonding interactions with HIS41. Conversely, with CYS145, ClickGen matches the interactions of the inhibitors, while ClickGen-inpainting forms fewer hydrogen bonding interactions.

Beyond HIS41 and CYS145, we observe that the inhibitors form more interactions with the residues GLU166 and GLN189. For GLU166, both ClickGen models replicate the inhibitor’s interactions. However, with GLN189, the generated hydrogen bond interactions are relatively fewer. Additionally, it is noted that the molecules generated by both ClickGen models form more interactions with the residues SER46, GLU47, ASP48, MET49, and LEU50 compared to the inhibitors. Past studies66,67,68 have shown that these residues typically form interactions with compounds that exhibit micromolar activity or require further structural modification. This suggests that a subset of the molecules generated by the ClickGen model might require structural modifications to enhance the likelihood of obtaining highly active compounds. These results indicate that crucial residue interactions are preserved during the generative process. Based on the results from Fig. S4, both ClickGen models deliver significantly better performance compared to the BBAR and Synnet models.

Subsequently, we used three different structurally distinct inhibitors57,58,69,70,71, CQ-TrOne, MCULE, and L-26, as the references to identify the molecules with the highest Tonimoto values from the generated results for both models. We analyzed their conformations and synthesizability, as depicted in Fig. 7b. In the figure, we observed that both ClickGen models were able to generate compounds with conformations similar to those of CQ-TrOne and MCULE. However, only the ClickGen model without the inpainting component was able to generate conformations similar to L-26. This is because, although L-26 shares a similar molecular weight and binding conformation with MCULE, the inpainting model tends to deviate from the S1 region of the binding site after generating the indole ring of L-26. In contrast, the piperazine ring can enter the S1 region after generation, achieving a better fit within the pocket. ClickGen without the inpainting model can directly select appropriate building blocks and generate compounds, thus allowing for a more direct and rational combination.

Regarding molecular structures, we observed that the molecules generated by the ClickGen-inpainting model’s design exhibit a slightly higher level of innovativeness compared to the inhibitors. The core scaffolds of the generated molecules undergo significant alterations in contrast to those of the inhibitors with the most similar conformations. Conversely, the ClickGen model’s design demonstrates lower innovativeness compared to the active inhibitors, with the generated molecules sharing highly repetitive scaffolds with the inhibitors. The primary reason for these performance differences between the two models can be attributed to the introduction of inpainting technique. In the case of ClickGen-inpainting, after the combination process, it can generate scaffolds based on the active pocket, thereby achieving scaffold hopping. Models without inpainting, on the other hand, use building blocks obtained from the curated database of a fixed number of synthons during assembly, resulting in some generated molecules sharing non-novel core structures.

Finally, we selected some structurally innovative molecules and conducted an analysis of their synthesis feasibility. As depicted in the right portion of Fig. 7b, it is evident that the ClickGen-generated molecules without the inpainting model require fewer synthesis steps. The core scaffolds utilized in these molecules have previously been reported for synthesizing similar inhibitors, eliminating the need for extensive exploration of synthetic conditions. Additionally, modifying segments such as 6-methylnicotinic acid are readily available from chemical reagent suppliers. In contrast, ClickGen-inpainting, although utilizing inpainting-generated building blocks, incorporates relatively novel scaffolds compared to the REAL dataset. However, these components are not entirely new fragments and can be procured as complete segments or precursors from chemical reagent suppliers, including compounds like bromomethyl indole, acetic acid, aminoethyl piperidine, and whey acid, among others. These synthons can be synthesized into the generated molecules through click chemistry and amidation. Both ClickGen-generated models exhibit differences in structural novelty, yet they can produce binding conformations similar to active inhibitors and possess high synthetic feasibility. ClickGen models share some core scaffolds with existing inhibitors, resulting in fewer synthesis steps, whereas ClickGen-inpainting generates molecules with higher novelty in core scaffolds, potentially requiring additional synthesis steps.

Design PARP1 inhibitors with ClickGen

Considering the impressive performance of ClickGen-inpainting in earlier three target validations, we proceeded to test its design capability against a therapetuic PARP1 (a current research interest of some of us) in wet lab. PARP1, a pivotal enzyme in DNA repair mechanisms, has emerged as a promising target in the development of anticancer therapies72,73,74. The recent FDA approval of novel PARP1 inhibitors further underscores its therapeutic significance74. We retrained the ClickGen model by incorporating the protein structure of PARP1 (PDB ID: 4BJC75) as the docking scoring reward in its reinforcement learning section, enabling the model to generate molecules based on the PARP1 protein pocket, and then generated a library of 100,000 molecules.

The workflow for the screening of the generated molecules is illustrated in Fig. 8a. The virtual screening process began by filtering out the molecules from the generated set that are non-novel, invalid, redundant, or highly similar in scaffold (Tanimoto similarity > 0.9). Following this, the molecular physicochemical properties were screened (with criteria of 250 < MW < 600, and 0 <logP <8). Subsequently, the pharmacophore matching module of Schrödinger software was employed to select molecules based on pharmacophore features. Only those molecules capable of forming the hydrogen bonds with GLY863 and SER904, as hypothesized in the pharmacophore model, were chosen for further docking studies.

Fig. 8: Workflow for generating and screening molecules, including MTT assay and inhibitory effects of selected lead compounds.
figure 8

a Workflow for virtual screening of the molecules generated by the ClickGen model to identify lead compounds. The screening workflow involves filtering generated molecules based on novelty, validity, redundancy, and scaffold similarity, followed by physicochemical property assessment and pharmacophore matching to select molecules capable of forming key hydrogen bonds. These molecules underwent docking simulations, and the top 1% based on docking scores were clustered by scaffold. Clusters were evaluated comprehensively, considering similarity to known PARP1 inhibitors, synthetic accessibility, and intellectual property potential. Compounds were categorized into low, medium, and high synthetic difficulty based on raw material availability and reaction complexity. After rigorous evaluation of synthesis routes, three lead molecules, spanning different synthetic challenges, were identified for further investigation; b The MTT (3-(4,5-Dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide) assay (n = 3) results showing the effects of the three lead compounds and the positive control on Human lung adenocarcinoma (A549), human ovarian (OVCAR-3), human colon (HCT-116), and human colon (MCF-7) cancer cell lines; c The inhibitory effects of the three lead compounds and the positive control on PARP-1 enzyme activity (n = 3). In the curves of subplots b and c, the dots represent the mean IC50 values from three independent measurements, and the error bars indicate the standard deviation (SD), and detailed mean ± SD raw data of IC50 are recorded in the source data.

In the subsequent docking studies, we employed the Glide module with the XP scoring mode of Schrödinger for scoring the molecules. After conducting docking studies, approximately 700 molecules, representing the top 1% in the docking scores, were selected and clustered based on their scaffolds. Following this, we conducted a comprehensive evaluation of each molecular cluster, considering metrics like similarity to PARP1 inhibitors, synthetic accessibility (as mentioned in the section Quality of the molecular set generated by ClickGen) and potential for intellectual property protection. Subsequently, we classified the compounds into different categories of synthetic difficulty based on the type of raw materials and reaction chemical bonds.

Specifically, compounds that are easy to synthesize are those where the reagents can be directly purchased or the chemical bond types are limited to amide and triazole structures. Compounds of moderate synthetic difficulty require potentially more synthetic steps, which may involve the protection and deprotection of certain groups, but all necessary synthons are commercially available or can be prepared by simple reactions. Compounds with high synthetic difficulty involve reagents that require more than two synthetic steps and encompass diverse types of chemical bonds. Ultimately, based on the novelty of the scaffolds, we selected 30 compounds across the categories of low, medium, and high synthetic difficulty. After a comprehensive evaluation of the planned synthetic routes for each cluster, focusing on the availability of raw materials and the ease of reaction conditions (in supplementary material S1 section, we conducted a detailed analysis of the synthesis routes for 30 compounds and synthesized some intermediates to demonstrate the synthetic feasibility of our method), we ultimately selected three lead molecules. These three lead compounds exhibit varying degrees in synthetic difficulty:

Lead compound 1 is easy to synthesize using readily available starting materials: Coumarin-3-carboxylic acid, 3-ethynylaniline, and 1-azido-2-bromobenzene. The final compound is obtained through a two-step synthesis process involving amidation and click chemistry.

Lead compound 2 presents a higher level of synthetic challenge, as it is composed of two synthons. One of these synthons has a complex structure that requires multi-step synthesis.

Lead compound 3 has a moderate level of synthetic difficulty. While its reactant materials are relatively easy to obtain, the formation of a novel urea chemical bond and the requirement for multiple combinatory steps necessitate the protection of specific functional groups. The synthetic route employed for the three lead compounds is illustrated in supplementary materials. The structural integrity of the target compounds was verified through 1H-NMR, 13C-NMR, and HRMS spectroscopic analysis.

Lead compound 1: Mp: 201.4-201.8°C 1H NMR (600 MHz, DMSO-d6) δ 7.77(s, 1H), 7.61(td, J = 8.1, 1.2 Hz, H), 7.60(td, J = 8.1, 1.2 Hz, H), 7.27-7.24(m, 5H), 7.04-6.87(m, 5H), 4.09 (dd, J = 4.6, 1.6 Hz, 1H), 3.29-3.24 (m,2H); 13C NMR (150 MHz, DMSO-d6), δ175.22, 171.01, 168.74, 156.93, 154.09, 150.78, 145.64, 140.91, 143.51, 139.1, 136.16, 134.94, 133.97, 127.17, 124.10, 122.26, 121.77, 116.57, 114.07, 108.25, 95.24, 94.39, 50.22, 22.04; ESI-HRMS calcd. for C24H18BrN403 [M + H]+ 490.3290, found: 490.3342

Lead compound 2: Mp: 166.6-166.9°C.1H NMR (600 MHz, DMSO-d6) δ 7.82–7.79 (m, 3H), 7.42 (td, J = 7.9, 1.2 Hz, 3H), 7.36–7.34 (m, 3H), 2.88 (s, 1H), 2.73–2.72 (m, 2H), 1.98 (s, 2H), 1.63 (d, J = 12.1 Hz, 2H), 1.48 (m, 3H), 1.29–1.15 (m, 7H); 13C NMR (150 MHz, DMSO-d6) δ 161.22 (d, J = 238.9 Hz), 155.05, 153.39, 147.18, 146.20, 143.14, 126.53, 126.35, 125.96, 125.94, 125.11, 125.09, 117.62, 117.49, 71.00, 66.30, 62.45, 60.23, 29.70, 26.41, 22.25, 14.55. ESI-HRMS calcd. for C24H28FN4O [M + H]+ 407.2149, found: 407.2254.

Lead compound 3: Mp: 244.1-244.7°C 1H NMR (600 MHz, DMSO-d6) δ 8.19-8.17(td, J = 8.1, 1.2 Hz, 2H), 7.79-7.74 (m, 4H), 7.63 (td, J = 8.8 Hz, 2H), 7.57 (t, J = 5.6 Hz, 1H), 7.34(s, 1H), 7.29 - 7.24 (m, 2 H), 7.21 (dd, J = 4.0 Hz 2H), 7.13(t, J = 5.6 Hz, 1H),6.51 (d, J = 8.5 Hz, 1 H), 4.39 (t, J = 8.5, 5.1 Hz, 1 H), 2.85 (t, J = 7.0 Hz, 2 H), 1.66(t, J = 6.5 Hz, 3H), 1.20 4 42(d, J = 6.8 Hz, 2H). 13C NMR (150 MHz, DMSO-d6) δ 162.22 (d, J = 238.9 Hz), 157.47, 154.17, 148.14, 143.82, 139.05, 133.67, 130.45, 129.43, 126.93, 124.50, 122.56, 121.40, 119.53, 116.11, 115.94, 112.87, 108.70, 37.92, 26.50, 22.74, 18.21. ESI-HRMS calcd. for C26H26FN6O3 [M + H]+ 489.1972, found: 489.1797.

The comprehensive results of the MTT assays, Safety Index (SI)76,77, and in vitro inhibition studies for the lead compounds are summarized in Fig. 8b, c and Table S3 respectively, with rucaparib serving as the positive control.

The anti-proliferative efficacy of the compound was determined using the MTT assay across various cell lines, including A549, OVCAR-3, HCT-116, and MCF-7. Notably, the lead compounds 2 and 3 demonstrated potent anti-proliferative activity against A549, HCT-116 and MCF-7 cells. The evaluation of the SI for the three lead compounds is presented in Table S3. The lead compounds 1 and 3 exhibit excessive toxicity across the four cell lines. Overall, the lead compounds 2 and 3 demonstrates consistently lower toxicity in various cell lines. Subsequent in vitro assays to evaluate PARP1 enzyme inhibition revealed that both lead compounds 2 and 3 exhibited nanomolar-level inhibitory activity. Notably, the lead compound 2 demonstrated superior inhibitory efficacy compared to the positive control rucaparib.

Discussion

In addressing the current issue of low synthesizability of many molecular generative models, we propose a novel synthesis-oriented generative mode called ClickGen. Unlike similar synthesis-oriented generative models, we employ modular click chemistry and amidation as the primary reaction rules, complemented by reinforcement learning and inpainting techniques. This enables the model to generate molecules that are potentially novel, strong binding tendency to the given target and easily synthesizable. In the target-specific benchmark tasks involving the SARS-CoV-2 and ROCK1 targets, ClickGen demonstrates over 30% higher novelty and diversity in the generated molecules compared to other synthesis-oriented models and produces more than twice the number of novel scaffolds. ClickGen also achieves a 10% improvement in the synthesizability metrics compared to the baseline models and exhibits better similarity in physicochemical properties and docking performance. The subsequent conformational analysis and synthesizability evaluations reveal that ClickGen can generate novel molecules with docking conformations similar to reported inhibitors and can be most likely synthesized by following model-recommended routes.

In the practical application of PARP1-targeted drug design, the molecules generated by ClickGen underwent virtual screening. Following this, we selected three lead compounds characterized by novel scaffolds. These compounds were successfully synthesized within a span of 10 days. The bioactivity analysis revealed that two lead compounds exhibit superior inhibitory effects on the proliferation of various cancer cell lines and PARP1 enzymatic activity, with reduced cytotoxicity, relative to the positive control Rucaparib. The ClickGen approach demonstrates the capability to rapidly design novel lead compounds with nanomolar-level activity while utilizing minimal resources. This signifies that the technologies employed in ClickGen represent a novel paradigm in overcoming the constraints of synthesizability and drug-likeness in the field of molecular generation methods.

While ClickGen represents a significant advancement in the construction of synthesis-driven generative models, there are still areas that require improvement. Firstly, the model’s coverage of reaction types is relatively limited. Although click chemistry and amidation have proven capable of constructing chemical databases at the billion-scale, the extensive use of these two reactions means that some molecules generated by ClickGen may still be subject to patent protection. Secondly, there is a dependence on reaction synthons. Regardless of the use of inpainting models, ClickGen relies heavily on publicly available initial reaction synthons. In future work, we plan to expand the range of reaction types within the chemical reaction generator by introducing more easily reproducible modular reactions. Additionally, we aim to enhance the generative capabilities of inpainting to reduce its reliance on the training dataset. Finally, we plan to integrate ClickGen with automated synthesis techniques. The reinforcement learning technology of ClickGen enables continuous learning and adaptation from new experimental data. This integration, combined with the flexibility of automated synthesis, allows for an ongoing optimization and iterative improvement of close-loop drug and material discovery.

Methods

Preparation of dataset and building blocks

In order to train the model to generate compounds in accordance with click chemistry and amide formation, a training set encompassing molecules made of both reactions needs to be constructed first. As illustrated in Fig. S6, we first represented the 48.2 million compounds from the REAL Diversity database47 in SMARTS78,79 format. Utilizing two reactions as templates, the compounds were segmented based on amide bonds and the 1,2,3-triazole ring. After converting the synthons to SMILES representation, distinct reactive sites of different synthons were labeled using specific symbols, such as [*0] or [*1], we used these synthons to form a combinatorial library, with the composition and characteristics of the library shown in Fig. S6.

Meanwhile, 917 ROCK1 inhibitors, 1203 SARS-Cov2 Mpro inhibitors, 1,154 AA2AR antagonists and 1667 PARP1 inhibitors (IC50 < 50 nM or Ki <50 nM) were collected from BindingDB80, ChEMBL81 and PDB data bank82. Then these three inhibitor datasets were docked into the binding pockets of ROCK1 (PDB ID: 6E9W83), SARS-Cov2 Mpro (PDB ID: 7L1184), AA2AR (PDB ID: 3EML85) and PARP1 (PDB ID: 4BJC75). The protein structures were processed by the Protein Preparation wizard in Schrödinger, including selecting only chain A, adding all missing residues, heavy atoms and hydrogens at pH=7.0, and minimizing the whole crystal structure until the root-mean-square deviation (RMSD) of the displacement of the non-hydrogen atoms lower than 0.3 Å86,87. The 3D structure of each ligand was generated by the LigPrep module in Schrödinger.

Construction of the Chemical Reaction-based combiner

We trained a Chemical Reaction-based combiner using the REAL dataset, applying several criteria to filter the database: (1) –2 <logP <7; (2) MW < 500; (3) HBA + HBD < 10; (4) TPSA < 150; (5) PAINS88 filter (used to filter out 10 compound classes with no prospects for drug) and (6) MCFs filter37,38 (used to discard compounds containing unstable or reactive groups that could result in the formation of toxic metabolites or intermediates). Finally, 1.4 million compounds were collected for the modeling training. This combiner is capable of assembling molecules by combining synthons based on the dataset’s characteristics of accessibility of synthesis and reaction rules. The combiner utilizes synthons or sub-structures as initial inputs, selecting appropriate synthons from the list of building blocks. This generator consists of a sequence of a 128-layer fully connected layer, a ReLU activation function, another 128-layer fully connected layer, followed by a Sigmoid activation function, the model produces an output denoted as p, representing the predicted probability of a selecting synthon. After training, the model exhibits p values close to 1 for correctly selected synthon and values approaching 0 for incorrect selection. The training strategy adopted in our study, termed negative sampling56, is commonly employed in computational chemistry for predicting the physicochemical properties of compounds29,89,90. The primary training procedure is depicted below and illustrated in Fig. S7a:

  1. (i)

    Determine whether the molecule contains disassemblable amide bonds and triazole rings, marking them as reaction sites. If disassemblable, the reaction sites at both ends are parsed into synthons and substructures, with the substructure proceeding to the subsequent step. If not disassemblable, the process is terminated.

  2. (ii)

    Based on the disassembly locations, a synthon database is constructed using the training set, comprising n compatible synthons (positive samples) and N incompatible synthons (negative samples). Here, compatible synthons are defined as those with a Tanimoto similarity coefficient greater than 0.7 relative to the cleaved synthon in the library, whereas incompatible synthons have a similarity coefficient less than 0.4. The model calculates the loss value for these molecules using Formula (1). The process then revisits step (i) to determine whether to terminate the procedure, undergoing a total of 80 training steps with 10,000 molecules selected in each step.

$${{{{\mathscr{L}}}}}_{{block}}=\frac{1}{n}{\sum }_{i}^{n}\log (\log {p}_{{right},i}^{{block}})+\frac{1}{N}{\sum }_{i}^{N}\log (1-\log {p}_{{wrong},i}^{{block}})$$
(1)

Inpainting-based generative model

Image inpainting refers to the process where a model generates missing parts of an image based on the clues of surrounding background, ultimately yielding a complete and sensible picture91,92. Inspired by this concept, we developed an inpainting-based molecular generative model, which should recommend suitable fragments to complete a partially built molecules with a missing part in its structure. To prepare training data for this model, we partitioned molecules from the REAL database in RDKIT by randomly splitting them along non-cyclic chemical bonds into three segments: Molleft, Molmid, and Molright. The overall framework of the inpainting model is inspired by U-net, adopting a U-shaped encoder-decoder architecture for molecular input-output processing (Fig. S7B). Within the network, we integrated a contextual attention mechanism, wherein skip connections link the decoder-predicted Molmid with Molleft and Molright from the encoder. Leveraging this contextual attention, the predicted Molmid in the decoder are substantially influenced by the rich chemical bond and atomic connectivity information from Molleft and Molright, leading to the prediction of more rational molecular structures.

The loss function for model training comprises both the reconstruction loss and the coherence consistency loss 93:

$${{{{\mathscr{L}}}}}_{{inpainting}}={{{{\mathscr{L}}}}}_{{recon}}+{{{{\mathscr{L}}}}}_{{consis}}$$
(2)

The \({{{{\mathscr{L}}}}}_{{recon}}\) ensures that the model can complete the molecule while preserving the structure in non-vacant areas. The \({{{{\mathscr{L}}}}}_{{consis}}\), on the other hand, ensures a smooth transition between the completed region and the adjacent molecular segments.

The calculation of \({{{{\mathscr{L}}}}}_{{recon}}\) is illustrated in Formula (3), where {\({lmol}\),\({mmol},{rmol}\)} represent the three parts of a molecule input during training, and {\(m^{\prime} {mol}\)} are the corresponding outputs from the model. Furthermore, M is a mask, defined as Formula (4). Here d denotes the distance of the current filling position, \({d}_{{total}}\) represents the full length of the middle segment, and σ is a quarter of the total length. The primary purpose of σ is to impose a lesser penalty on atoms that are farther away from the boundary.

$${L}_{{recon}}=\sum M\odot {{{\rm{\Vert }}}}{mmol}-m^{\prime} {mol}{{{{\rm{\Vert }}}}}_{2}$$
(3)
$$M\left(d\right)=\exp \left(-\frac{1}{2}{\left(\frac{d}{\sigma }\right)}^{2}\right)+\exp \left(-\frac{1}{2}{\left(\frac{d-{d}_{{total}}}{\sigma }\right)}^{2}\right)$$
(4)

To enhance the consistency of the output molecules, we incorporated the Bidirectional Content Transfer (BCT) into the inpainting model. The BCT comprises an Long Short-Term Memory (LSTM) encoder (\({E}_{{BCT}}\)) and an LSTM decoder (\({D}_{{BCT}}\)). Given that the molecular structure in the central region should manifest a smooth transition from lmol to rmol, we first input the \({f}_{{left}}\) and \({f}_{{right}}\) from the inpainting model into \({E}_{{BCT}}\) and \({D}_{{BCT}}\) for sequential prediction, yielding \({{f}^{\leftarrow}}_{{mid}}\) and \({\vec{f}}_{{mid}}\). The consistency loss is then computed as described in Formula (5).

$${L}_{{consis}}=\sum {{{\rm{\Vert }}}}{f}_{{left}}-{{f}^{\leftarrow}}_{{left}}{{{{\rm{\Vert }}}}}_{2}+{{{\rm{\Vert }}}}{{f}^{\leftarrow}}_{{mid}}-{\vec{f}}_{{mid}}{{{{\rm{\Vert }}}}}_{2}+{{{\rm{\Vert }}}}{f}_{{right}}-{\vec{f}}_{{right}}{{{{\rm{\Vert }}}}}_{2}$$
(5)

The \({f}_{{left}}\) and \({f}_{{right}}\) represent hidden layers of the inpainting model. The values of \({{f}^{\leftarrow}}_{{left}}\), \({\vec{f}}_{{right}}\), \({{f}^{\leftarrow}}_{{mid}}\), and \({\vec{f}}_{{mid}}\) can be determined using Formulas (6) and (7).

$${{{f}^{\leftarrow}}_{{mid}},{f}^{\leftarrow}}_{{left}}={D}_{{BCT}}\left({f}_{{right}}\,,{E}_{{BCT}}\left({f}_{{left}}\right)\,\right)$$
(6)
$${{\vec{f}}_{{mid}},\vec{f}}_{{right}}={D}_{{BCT}}\left({f}_{{left}}\,,{E}_{{BCT}}\left({f}_{{right}}\right)\,\right)$$
(7)

The whole inpainting model was optimized by the Adam optimizer with the starting learning rate of 10-4 and trained for 50 epochs until convergence. The entire task took about 40 h on NVIDIA GeForce GTX 4090 GPU.

Reinforcement learning model

To optimize the binding affinity of generated molecules towards specific targets, reinforcement learning grounded on Vina docking (AutoDock4.2) scores is employed to guide molecular generation. We developed two reinforcement learning models in this context: the first, referred to as the “prior model”, solely integrates the reaction-based generator; in contrast, the second encompasses both the reaction-based generator and the inpainting model. The initial model assembles molecules relying on pre-existing synthons. Concurrently, the latter model constructs molecules from generated synthons, ensuring the fully assembled molecules are both synthesizable and novel.

To enhance the binding affinity of the generated molecules towards the target, reinforcement learning based on vina docking scores is employed to guide molecular generation. We trained two variants of the prior models in reinforcement learning for comparison: the first employs existing synthons using the combinator described in section Construction of the Chemical Reaction-based combiner; while the second utilizes missing synthons, integrating both the reaction-based combinator and the inpainting generation model discussed in section Inpainting-based generative model. Given that the second prior model is a subset of the first, we will focus on detailing the construction of the first model.

For the inpainting procedure, we employ the Bemis-Murcko framework to mask the synthons referenced in Section Preparation of dataset. We ascertain whether a synthon possesses an R group. If present, the R group and its atoms are masked using the [*] character. In the absence of an R group, non-terminal atoms are randomly chosen for masking with the [*] character.

The UCB (Upper Confidence Bound) and UCT (UCB for Tree) algorithms within MCTS are frequently integrated into reinforcement learning frameworks94. This integration effectively combines tree search with learning derived from simulated episodes of experience, acquired through interactions with a model of the environment95. The MCTS process94 for assembling a synthon consists of three steps: (1) Selecting masked synthons and assembling them based on predefined rules; (2) Completing the masked portion via inpainting generative model; and (3) Considering step (1) as an action in Reinforcement Learning. The combination of steps (1) and (2) defines a state (s). After expansion, a reward score is assigned using the vina-score.

In the MCTS process, training is composed of four main stages, spanning 1000 steps in total:

Selection

Within a step, a combination path is selected according to Formula (8)24,96, where \(a\) represents all possible synthons to select from, \(Q\) denotes the action value, \(N\) is the total number of visits to the node, \(P\) is the predicted probability value for the synthon from the prior model, and \(c\) is the exploration coefficient, set at 1.3 in our case. After the synthon is selected, the masked portion undergoes inpainting.

$${a}_{t}={argmax}\left\{\frac{Q\left({s}_{t},a\right)}{N\left({s}_{t},a\right)}+{cP}\left({s}_{t},a\right)\frac{\sqrt{N\left({s}_{t-1}-{a}_{t-1}\right)}}{1+N\left({s}_{t},a\right)}\right\}$$
(8)

Expansion

If the current leaf node is not a terminal node, additional valid synthon nodes are generated based on the reaction combination sites. One of these nodes is then selected for expansion.

Simulation

Starting from the expanded node, a molecule is assembled as the output, continuing until a final reward \({v}_{i}\) is obtained.

Backpropagation

The reward score from the expanded nodes is propagated back to all its parent nodes, updating the \(N\) and \(Q\) values of those nodes29,96.

$$Q\left(s,a\right)=\frac{1}{N\left({s}_{t},a\right)}+{\sum}_{i=1}^{n}{r}_{i}$$
(9)

Evaluation metrics

Several widely used benchmarks (Formulae 10 ~ 13) were used to evaluate the quality of the generated molecule set (G) against the existing (or training) set (E). Validity (Formula 10) was used to evaluate the validity rate of the SMILES strings in G, where \({N}_{G}\) and \({V}_{G}\) are the numbers of the generated molecules and valid SMILES strings in G, respectively. Novelty (Formula 11) was used to evaluate the proportion of the compounds that exist in G but not in E. Diversity was used to evaluate the diversity of the compounds in G, in which the last term in Formula 12 calculates the average Tanimoto coefficient (T) of the generated molecules. Fréchet ChemNet Distance (FCD) defined by Formula 13 was calculated to measure the similarity between the generated molecules in G and the training data, where \(m\) and \(\varSigma\) refer to the mean vectors and the covariance matrices of the activations on the penultimate layer of ChemNet, respectively.

Regarding the training BBAR method29, we strictly follows the instructions provided in the GitHub repository (https://github.com/jaechang-hits/BBAR-pytorch.). We prepared the provided dataset of 210,000 molecule-protein dockings, recorded their scores, and created a.db file as the training set. Subsequently, we trained the model and generated molecules. For Synnet30, we utilized the target evaluation surrogate QASR model28,59,60 (https://github.com/micahwang/GARel) following the instructions provided in the code repository (https://github.com/wenhao-gao/SynNet) for training, with all other parameters set to default.

$$\begin{array}{c}{G}_{{validity}}=\frac{\left|{V}_{G}\right|}{{N}_{G}}\end{array}$$
(10)
$$\begin{array}{c}{G}_{{novel}}=1-\frac{\left|{set}\left({V}_{G}\cap E\right)\right|}{{V}_{G}}{{{\rm{\#}}}}\end{array}$$
(11)
$$\begin{array}{c}{G}_{{diversity}}=1-\frac{1}{{\left|{set}\left(V\right)\right|}^{2}}{\sum}_{\left(a,b\right)\in {set}\left(V\right)}T\left(a,b\right)\,\end{array}$$
(12)
$$\begin{array}{c}{FCD}\left(G,E\right)={{||}{m}_{G}-{m}_{E}{||}}_{2}^{2}+{Tr}\left({\varSigma }_{G}+{\varSigma }_{E}{-2\left({\varSigma }_{G}{\varSigma }_{E}\right)}^{\frac{1}{2}}\right)\end{array}$$
(13)

Synthesis and bioactivity assessment of lead compounds

Synthesis

All solvents and reagents were commercially available and were used without further purification unless stated. The progress of the reactions was monitored by thin-layer chromatography on a glass plate coated with silica gel with a fluorescent indicator (GF254, Qingdao Ocean Chemicals, China). The melting point of lead compounds were detected on the RD-1 melting apparatus (Tianjin Guoming Medical Equipment Co., LTD, China). The 1H and 13C nuclear magnetic resonance (NMR) spectra were recorded on a model 600 Bruker Avance spectrometer (Bruker, Germany) at 600 and 150 MHz, respectively. Chemical shifts are given in parts per million (δ) referenced to DMSO-d6 at δ 2.50 for 1H and δ 39.5 for 13C. High-resolution mass spectra (HRMS) of target compounds were performed by a Waters Q-TOF Premier spectrometer (Waters, USA). The synthesis routes were adapted from previously reported techniques97,98,99,100,101,102,103,104,105, with the detailed protocol for the lead compound provided in the Supplementary Information.

MTT assay

Human lung adenocarcinoma cell line (A549), human ovarian cancer cell line (OVCAR-3), human colon cancer cell line (HCT-116), and human colon cancer cell line (MCF-7) were purchased from American type culture collection (ATCC, USA). All cell lines were cultured in RPMI-1640 (Beijing Thermo Fisher Scientific Company, China), supplemented with 10% FBS at 37 °C and 5% CO2. And all cell lines used in the experiment were tested for mycoplasma contamination every two weeks. A549, OVCAR-3, HCT-116, and MCF-7 cells were seeded in 96-well plates, and then treated with lead compounds for 72 h. Then, 20 μL MTT (5 mg/mL, in PBS) was added into each well, and dissolved in 150 μL DMSO after incubation for 4 h. Finally, the absorbance was measured with microplate reader (490 nm).

Toxicity assay

Human normal alveolar epithelial cells (HPAEpiC), Human normal ovarian epithelial cells (IOSE80), Human normal intestinal epithelial cells (HIEC), Human normal breast cell (HTB-125) were cultured in 96-well plates. These cell lines were subsequently exposed to various concentrations of the lead compounds. The treatment protocol mirrored that of the MTT assay previously described.

Enzyme inhibitory activity assay

The PARP-1 inhibition assays were performed by the colorimetric 96-well PARP assay kits provided by BPS Bioscience, USA. (Catalog No. 80580, 80581)

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.