Molecular optimization using a conditional transformer for reaction-aware compound exploration with reinforcement learning

Nakamura, Shogo; Yasuo, Nobuaki; Sekijima, Masakazu

doi:10.1038/s42004-025-01437-x

Download PDF

Article
Open access
Published: 08 February 2025

Molecular optimization using a conditional transformer for reaction-aware compound exploration with reinforcement learning

Communications Chemistry volume 8, Article number: 40 (2025) Cite this article

15k Accesses
9 Citations
15 Altmetric
Metrics details

Subjects

Abstract

Designing molecules with desirable properties is a critical endeavor in drug discovery. Because of recent advances in deep learning, molecular generative models have been developed. However, the existing compound exploration models often disregard the important issue of ensuring the feasibility of organic synthesis. To address this issue, we propose TRACER, which is a framework that integrates the optimization of molecular property optimization with synthetic pathway generation. The model can predict the product derived from a given reactant via a conditional transformer under the constraints of a reaction type. The molecular optimization results of an activity prediction model targeting DRD2, AKT1, and CXCR4 revealed that TRACER effectively generated compounds with high scores. The transformer model, which recognizes the entire structures, captures the complexity of the organic synthesis and enables its navigation in a vast chemical space while considering real-world reactivity constraints.

Bridging the gap between chemical reaction pretraining and conditional molecule generation with a unified model

Article 05 December 2023

Targeted molecular generation with latent reinforcement learning

Article Open access 30 April 2025

High-throughput property-driven generative design of functional organic molecules

Article 06 February 2023

Introduction

Small molecules constitute one of the essential modalities in drug discovery. However, new drug candidates have become increasingly difficult to discover, and some take more than 12 years to develop and cost an average of $2.6 billion to bring to the market^1,2. In the early stages of research, hit compounds with high inhibitory activity against the target protein are often identified via high-throughput screening or virtual screening. These hit compounds are then structurally expanded by adding building blocks to explore compounds with the desired properties³. To find drug candidate compounds within a vast chemical space by starting from hit compounds,improving the efficiency of the DMTA cycle, which is a process used in drug discovery to design, make, test, and analyze compounds, is crucial.

Recent advances in deep learning models and their generalization capabilities have led to the development of generative models for de novo design and hit-to-lead optimization purposes⁴. However, most of these methods focus only on “what to make” and do not sufficiently consider “how to make”. Molecular generative models utilizing various architectures such as recurrent neural networks (RNNs)^5,6,7,8, variational autoencoders (VAEs)^9,10, and generative adversarial networks (GANs)^11,12, as well as molecular design methods that explicitly consider interactions with target proteins^13,14,15 have been developed. Since these methods do not consider how to synthesize the generated molecules, chemists must search for candidates for synthetic experiments from the numerous compounds suggested by the employed generative models; this aspect is a significant barrier preventing a shift to bench synthesis¹⁶.

The most commonly used scoring function for estimating synthetic ease, the SA score¹⁷, penalizes fragments, chiral centers, and certain substructures that are rarely seen in the PubChem database. Recently, Chen et al. developed a scoring method using building blocks and chemical reaction datasets to improve the SA score¹⁸. However, these topological methods are well suited for providing a rough estimate of the ease of synthesis in a high-throughput manner and cannot truly consider synthetic feasibility¹⁹. Although retrosynthesis models have been proposed to computationally predict synthetic pathways²⁰, they are unable to consider the complexity of chemical reactions; thus, the knowledge and experience of human experts are indispensable during the synthesis planning process²¹. Despite the recent advances in machine learning-based scoring functions^{22,23,24,25,26,27}, these functions predict only the results of retrosynthesis models. Therefore, the fundamental problems of these models remain unsolved²¹.

One approach for addressing the problem of generating synthesizable molecules is to construct virtual libraries through molecular design tasks with synthetic pathways by using predefined sets of reaction templates, which describe how reactants can be transformed into products^{28,29,30,31,32,33,34}. Advances in machine learning have led to the development of deep learning-based analogous methods in combination with optimization algorithms (e.g., genetic algorithms or Bayesian optimization)^35,36,37,38. These methods generate molecules that are similar to a given input compound in a descriptor space, such as extended connectivity fingerprints (ECFPs), and they also generate reaction pathways described by reaction templates. Reinforcement learning models, such as actor-critic and Monte Carlo tree search (MCTS), have been reported to be able to generate optimized compounds^39,40,41,42 by training neural networks to predict reaction templates and building blocks that maximize the value of the reward function. Recently, a template-based method utilizing GFlowNet was reported for generating compounds with a probability that is proportional to a reward value^43,44. However, the reaction templates used in these models are oversimplified, and chemoselectivity and regioselectivity, which are important aspects in real chemical reactions, are challenging to consider (Fig. 1a). Even though these selectivities may be considered by describing the patterns of the target reaction in detail, the scope of the substrates for all possible reactions is unrealistic to comprehensively define. Since new chemical reactions are reported daily, describing detailed chemical transformations is a time-consuming process.

**Fig. 1: Comparison of prior studies and the proposed method.**

As a data-driven approach, the development of structural optimization methods that can explicitly learn chemical reactions has the potential to address the above challenges. Bradshaw et al. proposed models that embed molecules or reaction trees into a latent space, which is a low-dimensional representation learned from the input data; here, similar data points (i.e., chemical structures or reaction trees) are embedded to make them close together^45,46. The Molecule Chef⁴⁵ learns the latent space from a “bag of reactants” (summation of reactant embeddings) and connects these reactants to the property of the product predicted by Molecular Transformer⁴⁷. Despite the fact that this process enables compound generation while considering real chemical reactions, it can only handle single-step reactions. DoG-Gen⁴⁶ has succeeded in mapping a latent vector of reaction trees via a hierarchical gated graph neural network (GGNN), achieving multistep synthetic pathway generation. However, in structural optimization cases, DoG-Gen cannot follow the drug discovery process; it is incapable of generating compounds by starting from specific hit compounds.

In this study, we propose TRACER, which is a molecular generation model that combines a conditional transformer and an MCTS. The transformer model is constructed on the basis of attention mechanism, which allows it to selectively and directly focus on the important components contained in the input sequence data⁴⁸. Therefore, the transformer model trained on chemical reactions such as the structural transformation contained in the simplified molecular input line entry system (SMILES)⁴⁹ is expected to recognize the substructures that affect reactions from the whole reactants, achieving high accuracy in forward synthesis prediction tasks⁴⁷. Previously developed data-driven methods^45,46 generate optimized structures in the latent space, so they can search for compounds with the desired properties in a structure that smoothly fills a continuous space (Fig. 1b). In contrast with previous works that used end-to-end architectures, our proposed framework decouples the transformer as a product prediction model from the MCTS, which is used as a structural optimization algorithm. Therefore, TRACER enables direct optimization on the basis of virtual chemical reactions; thus, it is expected to propose structures and synthetic pathways that are far from the training dataset (Fig. 1c). Our model learns 1000 reaction types that are associated with a real chemical conversions, enabling the model to learn more fine-grained information than that of the rule-based methods, which use approximately 100 reaction templates^{35,36,38,39,40,41,42}. With the continuous growth of chemical reaction databases, the model needs to be applied to an increasingly diverse range of reactions. However, transformer models have been verified to follow scaling laws⁵⁰ in natural language processing tasks, indicating that increasing the number of model parameters can effectively address this challenge.

Results and discussion

Effect of the conditional token on chemical reactions

In this study, the transformer model was trained on molecular pairs created from chemical reactions via the SMILES sequences of reactants and products as source and target molecules, respectively. Following the evaluation method proposed by Yoshikai et al.⁵¹, partial accuracy (the accuracy per SMILES token) and perfect accuracy (the accuracy per molecule) were calculated for the validation data to assess the ability of the model to recognize partial and whole product structures. The impact of including or excluding reaction template information on the accuracy of the model was investigated (Fig. 2). The partial accuracy quickly reached approximately 0.9, regardless of the presence or absence of the reaction template information; this result indicated the ability of the model to learn partial product structures from the reactants, which aligned with the chemical knowledge that the scaffolds of reactants often remain in the products. However, the perfect accuracy increased more slowly than the partial accuracy did in both the unconditional and conditional models; thus, learning the entire product structure required more time. The perfect accuracy plateaued at approximately 0.6 for the conditional model and 0.2 for the unconditional model. These results indicated that the model had difficulty reproducing the chemical reactions from the training data without the reaction templates, and the accuracy improved when this information was provided. These results indicate the fact that a single substrate can undergo numerous chemical reactions and yield various products in the absence of additional constraints.

**Fig. 2: Evaluations of the conditional transformer (orange) and unconditional transformer (blue).**

The top-n accuracies attained by the models for the test data were calculated (Table 1). The top-1, top-5, and top-10 accuracies were improved by adding information to the reaction templates. These results indicate that knowledge regarding the chemical reactions narrows the chemical space for predicting molecules and improves the ability of the model to generate appropriate products from the learned chemical reactions.

Table 1 The top-n accuracies achieved by the transformer models

Full size table

In some cases, bias was observed in the products generated by the unconditional transformer from the reactants. Figure 3a shows an example where the top-10 products were generated by a beam search with a beam width of 100 in the trained unconditional transformer, where the SMILES sequence of the depicted reactant was used as the input. The generated products were biased toward compounds transformed by reactions involving Csp(3) bonds to an indole-type nitrogen atom; these results were attributed to the bias in the chemical reactions involving similar transformations in the USPTO dataset. In contrast, by using the conditional transformer and the reaction templates predicted by the GCN, compounds could be generated through diverse chemical reactions (Fig. 3b). The GCN selected not only combinations of the reactants and reaction templates included in the training data, such as epoxide ring opening reactions and Buchwald-Hartwig cross-coupling with vinyl halides but also combinations of compounds and reaction templates that were not present in the training data, such as amidation and sulfonamidation, and successfully generated reasonable product candidates. These results indicated that the conditional transformer can extract chemical reaction knowledge from conditional tokens and propose rational products even for unknown combinations of reactants and chemical reactions. An example of the product generation results produced under the condition of unmatched reaction templates, where no partial structure matches were found, is presented in Supplementary Fig. 1.

**Fig. 3: Examples of the generated molecules.**

Optimization of compounds involving the generation of synthetic routes targeting specific proteins via a QSAR model

Five starting materials were selected for DRD2, AKT1, and CXCR4 from the USPTO 1k TPL dataset. The detailed procedure is described in Section 4.5, and the selected molecules are shown in Fig. 4. Utilizing these reactants as the root nodes, the MCTS calculations from selection to backpropagation were executed for 200 steps. The number of reaction templates predicted in the expansion step was set to 10. These hyperparameters could be adjusted on the basis of the available computational resources.

**Fig. 4: The set of molecules selected as starting materials.**

The compounds generated by the MCTS were evaluated in terms of the beam width in the transformer inference process, the total number of generated molecules (Total), the ratio of the number of unique molecules to those contained in the USPTO dataset, the number of molecules (Num) and the proportion of molecules with active prediction probabilities exceeding the threshold of 0.5 imposed by the QSAR model (Tables 2, 3, and 4). In addition to these metrics, to confirm the depth of the MCTS, the proportions of the reaction steps for the generated compounds were investigated (Fig. 5). A uniqueness analysis revealed that TRACER efficiently generated compounds that were distinct from those in the USPTO database across all target proteins. The efficiency of generating compounds with QSAR values exceeding 0.5 was significantly influenced by the starting compound. For DRD2 and AKT1, when the starting material had a high QSAR value, the structural exploration process started from scaffolds similar to those of the active compounds learned by the QSAR model, leading to a higher ratio of discovered molecules with high reward values. However, even when starting with compounds possessing low QSAR values, TRACER successfully generated compounds with high QSAR values by adjusting the beam width of the transformer. It was difficult to discover optimized compounds for CXCR4 when starting from compounds with low QSAR values (starting materials 11 and 12). However, when starting from compounds 13-15, our model successfully generated compounds with QSAR values exceeding 0.5 at a rate of approximately 15%. Starting from compounds 11, 12, and 15, it was challenging to improve the QSAR value. This is likely because the number of known CXCR4 ligands is relatively small, and few scaffolds are similar to the active compounds learned by the QSAR model that can access from these starting materials, thus limiting the opportunities for further development.

Table 2 Structural optimization results obtained for DRD2

Full size table

Table 3 Structural optimization results obtained for AKT1

Full size table

Table 4 Structural optimization results obtained for CXCR4

Full size table

**Fig. 5: Distribution of the reaction steps.**

Overall, the optimal beam width for the conditional transformer model was found to vary depending on the starting compound. As shown in Fig. 5, increasing the beam width led to a reduced depthwise exploration process. In contrast, a narrower beam width was observed to increase the depth, leading to a greater number of reaction steps. These results indicated that, in some cases, promoting depthwise exploration by narrowing the tree width was more effective for discovering molecules with high reward values than dispersing the nodes to be explored by widening the tree width (starting materials 2-3, 6, 9, and 11). Conversely, in other cases, increasing the beam width facilitated the generation of high-reward molecules by promoting breadth-wise exploration (starting materials 4, 10, and 13). These findings demonstrate that TRACER is capable of performing structural optimization when starting from various initial compounds by optimizing the beam width.

To investigate the potential impact of increasing the beam width on the diversity of the generated molecules, the internal diversity of the compound sets that exceeded the threshold was calculated. In some cases, decreasing diversity trends were observed when the beam width was increased (starting materials 1, 3, 7, 9, and 12), whereas no notable influences were observed for the other materials.

In our model, since some reactants are treated implicitly, the reactants that are newly added to the input compounds were also investigated. For all pathways generated in Tables 2, 3, and 4, a total of 20,765 diverse reactants were identified, enabling this model to generate molecules with unique structures relative to those of the USPTO dataset.

Reaction pathways generated by the model

Figure 6 shows the compounds with the highest reward values and their synthesis routes for each of the starting materials. In these synthetic pathways, all reactants were not explicitly addressed. Therefore, the availability of the building blocks shown in these figures was investigated via Reaxys⁵², and all the starting materials were found to be commercially available (Supplementary Figs. 2–16). This result indicated that a wide range of reactants in USPTO are purchasable or synthesizable and that the conditional transformer captured this feature even though the model implicitly learned part of the reactants.

**Fig. 6: Compounds with the highest QSAR values for DRD2 and their synthesis routes.**

Starting from 1, a series of reactions, including a nucleophilic attack of an alcohol on an aryl chloride and reductive amination were selected; this resulted in the generation of 1c with a QSAR value of 0.962. During the synthesis of 1a, the aryl fluoride scaffold of 1 and the newly added 2-chloropyrazine (Supplementary Fig. 2) could compete in an aromatic nucleophilic substitution reaction. However, several similar reactions are contained in the USPTO dataset, where aliphatic alcohols react with chloropyrazine, and an actual patents also exist⁵³. This result indicates that TRACER correctly predicted the selectivity of the aromatic nucleophilic substitution reaction by learning real chemical reactions. Predicting such selectivity is extremely difficult with reaction template-based methods that determine applicable reaction templates without considering real chemical reactions. For compound 2, sulfonamidation of the amine was utilized to synthesize 2a, followed by conducting nucleophilic substitution and a nucleophilic attack on the alkyl halide to yield 2c. In the case of 3, 3a was generated by amidation with pivaloyl chloride. 3a was converted to 3b via Friedel-Crafts acylation. Generally, when an sp³ carbon is bonded to an aromatic ring, Friedel-Crafts reactions proceed selectively at the para position⁵⁴, which the conditional transformer model successfully predicted. The derivation from 4 involved the amidation and two subsequent nucleophilic attacks of heteroatoms on alkyl halides to produce 4c. When 5 was used as the starting material, the introduction of the aliphatic amine moiety on the piperazine nitrogen using a phthalimide derivative provided 5b, and subsequent amidation provided 5c with a QSAR value of 0.994.

The results of the experiments conducted using the QSAR models of AKT1 and CXCR4 are shown in Figs. 7 and 8, respectively. Several examples were found in these reaction pathways, where the selectivities of the chemical reactions were correctly predicted. In the transformation from 8a to 8b, a method for reducing a nitro group while retaining a cyano group has been reported⁵⁵. During the synthesis of 8d, the substitution at the 3-position of pyridine and the ortho-para directing effect of an aniline were correctly predicted. In the synthesis from 9a to 9b, similar reactions in which the ortho position of the amide groups in the substrates is halogenated using a metal catalyst^56,57 have been reported. For 11a, the exact same reaction has been reported, in which reductive amination occurs while the alcohol is retained⁵⁸. In the transformation from 11a to 11b, the aryl fluoride undergoes a substitution reaction while retaining the aldehyde of the newly added building block; indeed, several such examples exist^59,60. As shown in these examples, TRACER can predict the selectivity of chemical reactions by explicitly learning actual chemical reactions.

**Fig. 7: Compounds with the highest QSAR values for AKT1 and their synthesis routes.**

**Fig. 8: Compounds with the highest QSAR values for CXCR4 and their synthesis routes.**

Although the structural optimization process using an unconditional transformer was explored, the generated synthetic routes proved impractical due to the use of several unreasonable transformations, such as multiple-step reactions performed in a single inference, the insertion of carbon atoms into alkyl chains, and a disregard for selectivity. Therefore, the retention rate of the Murcko scaffold⁶¹ of the reactants was investigated for the routes generated via structural optimization, with the QSAR model of DRD2 as a reward function. The retention rate was 94.5% for the conditional transformer, whereas it was 43.3% for the unconditional transformer. Considering these results together with the perfect accuracy investigation conducting in Section 5.3, it can be concluded that conditioning on chemical reactions is crucial.

Comparative analysis with molecular generation models that incorporate synthetic route generation

To evaluate TRACER, structural optimization experiments were conducted on the QSAR models of DRD2, AKT1, and CXCR4 along with benchmark models (the Molecule Chef⁴⁵, DoG-Gen⁴⁶, CasVAE³⁷, and SynFlowNet⁴³). A comparison among these models is presented in Table 5. The specification of starting compounds, which is a crucial function in drug discovery, is supported only by our framework. Furthermore, the models that are capable of explicitly learning chemical reactions are limited to TRACER, the Molecule Chef, and DoG-Gen. Notably, the Molecule Chef is applicable only to single-step reactions. Despite the fact that CasVAE and SynFlowNet, which are rule-based methods that utilize reaction templates, can perform structural optimization accompanied by the generation of reaction pathways, they are unable to consider actual chemical reactions. The optimization methods proposed by each model are described in the following section.

Table 5 Comparison among the features of the benchmarking models. SM stands for starting material

Full size table

Molecule chef

In the Molecule Chef, the reactants involving in chemical reactions are vectorized via graph neural networks (GNNs). A network for mapping to a latent space is learned by utilizing a “bag of reactants”, which is obtained by summing these vectorized representations. In the original paper, a property predictor was connected to the latent space, and the coordinates were updated by calculating the gradient of the latent variable to generate optimized molecules. For comparative experiments, the property predictor was trained by precalculating the QSAR values of the training dataset used in the original paper, which was derived from USPTO dataset.

DoG-Gen

DoG-Gen is the only model among the benchmark models in this comparative study that is capable of explicitly learning chemical reactions and handling multistep transformations. The mapping to the latent space was learned by encoding reaction trees, constructed from the USPTO dataset, with a two-stage Gated Graph Neural Network (GGNN). In the benchmark experiments, following the methodology of the original paper, 30 rounds of operations were performed. Each round involved sampling 7000 reaction trees from DoG-Gen and conducting fine-tuning twice with the top 1500 molecules ranked by their QSAR values. The same training dataset as that employed in the original paper, which was derived from USPTO, was used in this experiment.

CasVAE

In CasVAE, separate latent spaces are constructed for compounds and reaction trees, utilizing the pathways obtained by decomposing the molecules in the USPTO dataset with a reaction template-based retrosynthesis model⁶². Decoding is performed by using the latent variables sampled from each space. As per the original implementation, five batches of 3000 molecules were generated by applying Bayesian optimization to the latent space of the molecules.

SynFlowNet

SynFlowNet is a GFlowNet-based model that was designed to sample data points with a probability proportional to a reward value. The tasks of predicting applicable reaction templates for compounds and reaction partners for bimolecular reaction templates are performed via neural networks, and the structure expansion procedure based on chemical reactions is modeled as a Markov process. The original paper utilized building blocks derived from Enamine building blocks⁶³. In this comparative experiment, the training data were replaced with the USPTO dataset using the same preprocessing procedure as the original paper to standardize the conditions. The experimental conditions were set to be identical to those in the original implementation; the batch size during the generation step was 64, the temperature parameter was 32, and the number of training steps was 1000.

Experimental results

For all the benchmarked models, their compound generation effects were investigated via QSAR models of DRD2, AKT1, and CXCR4 (Table 6). For TRACER, the optimized beam widths for each starting material presented in Tables 2–4 were used in combination. As additional evaluation metrics beyond those used in Section 2.2, the proportion of unique generated compounds (uniqueness) and the Fréchet ChemNet distance (FCD)⁶⁴ were calculated to evaluate the similarity between the generated compound set and the training dataset. Cases where the numbers of unique compounds were less than 5,000 were excluded because the FCD was shown to be unreliable in the original paper when the number of compounds was small, particularly when it was below this threshold.

Table 6 Comparison among the molecular generative models targeting each protein

Full size table

When DRD2 was targeted, TRACER presented the greatest compound uniqueness and the largest proportion of generated USPTO-unique molecules. SynFlowNet generated a substantial number of duplicate compounds, resulting in only 10.9% unique compounds. DoG-Gen exhibited a USPTO-unique compound ratio of 41.3%, indicating a large number of generated compounds overlapping with the training data due to its learning of a mapping from the reaction trees to the latent space. Furthermore, the FCD values of both the Molecule Chef and DoG-Gen were low, suggesting that methods that construct latent spaces via reaction trees tend to explore regions that are close to the training data. SynFlowNet exhibited the highest FCD, which can be attributed to its reaction template-based approach, where various reaction templates can be applied to the starting materials to increase the FCD. A high FCD of 14.3 was also observed for TRACER, suggesting its success in the challenging task of generating compounds that are dissimilar to the training data while explicitly learning chemical reactions. CasVAE showed the best results in terms of the proportion of compounds with QSAR values greater than 0.5 and the overall diversity. This is likely because CasVAE performs Bayesian optimization solely on the latent space of the compound structures, thus performing a simpler task that does not consider the latent representations of reaction trees. Consequently, as shown in Section 2.4.3, CasVAE attempted to generate 15,000 reaction trees but yielded only 615 valid trees. This was mainly due to the selection of incorrect reaction templates during the reaction tree generation process and the failure to construct complete reaction pathways, resulting in a lack of high-throughput capabilities.

When targeting AKT1 and CXCR4, TRACER presented the highest values for the USPTO-unique compound ratio and FCD. This result indicates that TRACER robustly explored a vast chemical space with respect to the reward function while considering real chemical reactions. With respect to the proportion of compounds exceeding the QSAR threshold, CasVAE presented the highest value for AKT1, whereas DoG-Gen presented the highest proportion for CXCR4. As described in Section 5.2, the numbers of active compounds for DRD2, AKT1, and CXCR4 were 9,913, 4,035, and 925, respectively. This suggests that the performance of CasVAE was significantly influenced by the number of known ligands, which made it difficult to discover optimized molecules for CXCR4. DoG-Gen exhibited the highest degree of diversity, which is presumably attributed to the absence of an initial compound selection phase, allowing for a greater degree of freedom when exploring the chemical space. TRACER attained moderate performance in terms of the proportions of compounds generated with high QSAR values (17.0% for DRD2, 25.5% for AKT1, and 13.5% for CXCR4) compared with those of DoG-Gen and CasVAE. Notably, this ability significantly depends on the initially selected compounds. To evaluate the optimization ability of our framework with respect to the reward function, compounds with low QSAR values were intentionally set as the starting compounds. However, the other methods do not have such strong constraints, allowing for a greater degree of flexibility during the search process.

The means and standard deviations of the SA scores and molecular weights for the compounds generated by each model are shown in Table 7. With respect to the SA scores, all the models exhibited comparable performance, except for SynFlowNet, which targeted CXCR4. Although the SA score includes a size penalty term¹⁷, which was designed to penalize larger molecules, TRACER generated compounds with higher molecular weights than those of the Molecule Chef and DoG-Gen while maintaining comparable SA scores.

Table 7 Comparison among the SA scores and molecular weights of different models

Full size table

Structural optimization experiments involving the QSAR model starting from unseen molecules

Structural optimization experiments designed to simulate the actual drug discovery process were conducted. A dataset consisting of nonpatented-like compounds was obtained from the ZINC database as described in Section 5.3. The compounds with the highest ligand efficiency (LE), which is defined as the absolute docking score normalized by the number of heavy atoms, were then selected as the starting materials for each protein. As a result, 16 was selected for DRD2 (docking score = − 5.1, LE = 0.729) and AKT1 (docking score = − 4.5, LE = 0.643), whereas 17 was selected for CXCR4 (docking score = − 4.7, LE = 0.671) (Fig. 9).

**Fig. 9: Set of molecules selected as starting materials (16 for DRD2 and AKT1, 17 for CXCR4).**

The results of the experiments conducted using compounds 16 and 17 as starting materials are presented in Table 8. Although these initial compounds exhibited high LE, their QSAR values were 0.00, 0.01, and 0.01 for DRD2, AKT1, and CXCR4, respectively. However, TRACER enhanced the reward values by optimizing the structures of these compounds. Through a beam width investigation, compounds with QSAR values exceeding the threshold were generated at rates of 11.2% (beam width = 10) for DRD2, 24.5% (beam width = 10) for AKT1, and 9.81% (beam width = 50) for CXCR4. Consistent with the results presented in Section 2.2, the majority of the generated compounds were not included in the training dataset.

Table 8 Results of the experiment that started from unseen molecules in the ZINC dataset

Full size table

For each target protein, the compound sets generated by TRACER were compared with known ligands. Pairwise Tanimoto coefficients were calculated between the known ligands and the compounds generated by TRACER with QSAR values greater than 0.5. The compound pairs with the highest similarity, the synthetic routes for the generated compounds, and the docking poses are shown in Fig. 10. Although these known ligands were not included in the training data, compounds with Tanimoto similarities of 0.838, 0.588, and 0.552 were generated for DRD2, AKT1, and CXCR4, respectively. For these generated compounds, TRACER generated synthetic routes consisting of 2, 3, and 4 steps. Although the QSAR model does not explicitly consider docking scores, these generated compounds exhibit comparable binding potential to active ligands. These results indicate that our framework can effectively generate optimized compounds for the reward function, even when starting from molecules that are not included in the training dataset. Furthermore, compounds that are structurally similar to active ligands that were not included in the training dataset could be generated via the QSAR model, along with by the generation of reaction pathways.

**Fig. 10: Structures and docking poses of the most similar compound pairs.**

Conclusions

In this study, a conditional transformer model was developed for the prediction of diverse chemical reactions. Compared with the unconditional model, the proposed model was shown to significantly improve the accuracy of the output product predictions. By incorporating reaction template information, the model was able to generate molecules through a wide variety of chemical reactions.

TRACER, which is a combination of a conditional transformer, a GCN, and an MCTS, was investigated for its ability to optimize compounds targeting specific proteins through the generation of synthetic pathways. MCTS-based molecule generation experiments using the QSAR model to predict the activity of DRD2, AKT1, and CXCR4 demonstrated the applicability of the conditional transformer in terms of drug discovery. MCTSs with various search widths showed that, in some cases, widening the search width enabled the efficient generation of compounds with high reward values. In contrast, in other cases, narrowing the search width facilitated a deeper search process, leading to the discovery of molecules with high reward values. Furthermore, the model performed structural transformations by appending commercially available building blocks, although it implicitly learned some of the reactants in the dataset. In benchmarking experiments, TRACER explored a chemical space farther from the training data than the other models did. Moreover, TRACER successfully generated compounds with high reward values, even when starting from compounds that were absent in the training data and with QSAR values close to zero. These results indicate the robustness of our framework against variations in the starting compounds and target proteins. The integration of additional models, such as a reaction condition recommendation model^65,66 or a multiobjective optimization model⁶⁷ for lead optimization, is expected to further accelerate the process of optimizing novel compounds in drug discovery tasks. Future work is needed to improve the robustness of the proposed approach by optimizing the model with larger datasets and exploring its potential applications involving diverse reward functions.

Methods

Figure 11a shows the procedure used by the proposed model to generate feasible products from the given reactant and to optimize the structure on the basis of virtual chemical reactions. First, a graph convolutional network (GCN) is used to predict the applicable reaction templates for the reactant. To filter mismatched templates, those with no substructures that match the reactant are removed at this time. Second, the transformer performs molecular generation under the conditions of the reaction type predicted for the reactant. This operation enables the model to be aware of how the reactant should be transformed into a product by learning real chemical reactions that correspond to the conditional templates. The number of outputs for each reaction type is examined in Section 2. The reaction templates with only one suitable product, such as the deprotection of nucleophilic substituents, are manually annotated with indices of the reaction templates. These templates are assumed to generate only a single molecule. Fig. 11b shows an overview of the optimization framework using an MCTS. In this study, the MCTS uses simulations to selectively grow a search tree and determine the most promising actions, in conjunction with the transformer and GCN. It searches for the optimal compounds through four steps: selection, expansion, simulation, and backpropagation. In the selection step, the most promising molecule is determined via the upper confidence bound (UCB) score described in Section 4.3. In the expansion step, the applicable reaction templates for the molecule are expanded, and the transformer performs conditional forward synthesis-based prediction for each reaction template. In the simulation step, the potential of the newly generated molecules is evaluated by further exploring the tree. Finally, the estimated values are propagated back to the root node, ensuring that molecules with higher values are more likely to be selected in the next cycle. The computing environment for all the experiments was an Ubuntu 22.04 OS, an Intel(R) Xeon(R) Gold 5318Y CPU, 128 GB of RAM, and an NVIDIA GeForce RTX 4090 GPU (with 24 GB of memory).

**Fig. 11: Comprehensive overview of an integrated model incorporating a conditional transformer and an MCTS.**

Conditional transformer model

In recent years, the development of compound generation models based on transformer architectures⁴⁸ has significantly advanced^68,69. These models utilize attention mechanisms to learn long-range dependencies in sequential data such as SMILES strings. Notably, the Molecular Transformer⁴⁷ achieved high accuracy in forward chemical reaction prediction tasks. It was trained on the USPTO dataset to predict the products from the given reactants. Unlike the previously developed forward synthesis prediction models that generate products when provided with all starting materials, our model was developed to predict compounds that can be synthesized from a single starting material through various chemical reactions.

The conditional transformer was employed to control the diversity of virtual chemical reactions. Previous studies on transformer models have reported that the properties of generated molecules can be successfully controlled by appending conditional tokens corresponding to the desired properties^70,71,72. In this study, indices of the 1,000 reaction templates extracted from the USPTO 1k TPL dataset⁷³ were added to the beginning of the input SMILES sequences, and the transformer was trained to predict the product corresponding to the reaction (Fig. 12). These manipulations were performed to control the diversity of the chemical reactions by conditioning various reaction templates on the reactants. As described in the Section 5.1, the reaction dataset considers only a single reactant as an explicit input. However, in most cases where partner compounds exist, they can be determined from the reaction template and the structural differences between the reactant and the product. This training approach enabled the model to learn all the necessary information for describing the chemical reactions while avoiding the need to select partner compounds from tens of thousands of potentially existing molecules. Consequently, the model is expected to generate plausible structural transformations in reference to real chemical reactions.

**Fig. 12: Procedure used by the conditional transformer to learn structural transformations under the conditions of the reaction template.**

To clarify the effect of the conditional tokens, conditional and unconditional transformer models were implemented via PyTorch⁷⁴ and torchtext libraries. The hyperparameters were determined according to the original transformer model⁴⁸ as follows: the batch size was 128, the dimensionality of the model was 512, the dropout ratio was 0.1, the numbers of layers in the encoder and decoder were 6, the dimensionality of the feedforward network was 2048, the activation function was a rectified linear unit (ReLU), and the number of attention heads was 8.

Graph convolutional network (GCN)

The GCN model⁷⁵ was implemented on the basis of a previous study that predicted reaction templates from the SMILES representations of compounds⁷⁶. Molecules with n nodes are represented as ${{{\mathcal{M}}}}\equiv (A,E,F)$, where A ∈ {0, 1}^n×n is the adjacency matrix, $F\in {{\mathbb{R}}}^{n\times d}$ is the matrix of the node features, and E ∈ {0, 1}^n×n×T is the edge tensor when T types of bonds are present. Using these representations and A = {A^(t)∣t ∈ T}, the molecules can be represented as ${{{{\mathcal{M}}}}}^{{\prime} }\equiv ({{{\bf{A}}}},F)$, where ${({{{{\bf{A}}}}}^{(t)})}_{i,j}=1$ if a bond of type t is present between nodes i and j. The architecture of a GCN typically consists of convolution layers, dense layers, and aggregation layers.

Graph convolution layer

A graph convolution is computed as follows:

$${{{{\bf{X}}}}}^{(l+1)}=\sigma \left({\sum}_{t}{\tilde{{{{\bf{A}}}}}}^{(t)}{{{{\bf{X}}}}}^{(l)}{{{{\bf{W}}}}}_{t}^{(l)}\right)$$

where ${\tilde{{{{\bf{A}}}}}}^{(t)}$ is a normalized adjacency matrix for bond type t, X^(l) is the input matrix of the lth layer, and ${{{{\bf{W}}}}}_{t}^{(l)}$ is the parameter matrix for the lth layer with a bond type t.

Dense layer

After applying the graph convolution, the output X^(l+1) is typically passed through a dense layer, which is also known as a fully connected layer. The dense layer independently applies a linear transformation to the features of each node. The output X^(l+1) can be expressed as follows:

$${{{{\bf{X}}}}}^{(l+1)}={{{{\bf{X}}}}}^{(l)}{{{{\bf{W}}}}}^{(l)}$$

where X^(l) is the input matrix, and W^(l) is the weight matrix of the dense layer.

Aggregation layer

After the graph convolution and dense layers, the resulting node representations need to be aggregated to obtain a graph-level representation. Following previous work⁷⁶, sum aggregation was employed. The output is calculated as follows:

$${\left({{{{\bf{X}}}}}^{(l+1)}\right)}_{j}={\sum}_{j}{\left({{{{\bf{X}}}}}^{(l)}\right)}_{ij}$$

where ${\left({{{{\bf{X}}}}}^{(l+1)}\right)}_{j}$ is the jth element of the vector representation. The graph-level representation X^(l+1) obtained through sum aggregation can be further processed to produce the final output of the GCN model. Additional dense layers are applied to predict the reaction templates. The overall architecture of the GCN model is shown in Fig. 13.

**Fig. 13: Architecture of the GCN model.**

For the reactants included in the training data, the GCN model was trained to predict the correct reaction templates from 1,000 available templates. This implies that the random selection of a reaction template would result in an accuracy of 0.001. The hyperparameters of the GCN model were determined via Optuna⁷⁷ (Table 9). Finally, the trained GCN model was evaluated on the test data, yielding a top-1 accuracy of 0.483, a top-5 accuracy of 0.771, and a top-10 accuracy of 0.862.

Table 9 The hyperparameter ranges in the GCN, where the values selected for the best model are shown in bold

Full size table

Monte Carlo Tree Search (MCTS)

The MCTS algorithm⁷⁸ was utilized to generate optimized compounds for the reward function. MCTSs integrated with RNNs, genetic algorithms, or matched molecular pairs^{5,6,7,79,80,81} have been demonstrated to be effective in molecular optimization tasks. In our framework, the nodes represent molecules, and the paths involving any nodes represent synthetic routes. The MCTS algorithm consists of four steps, each of which is detailed below.

1.
Selection: Child nodes are iteratively selected by starting from the root node based on its estimated value and visit count. The tree policy, which is a score function for selecting child nodes, plays an important role in the performance of the MCTS. One commonly used metric is the UCB score, which was originally proposed for multiarm bandit problems. This approach balances the exploration of nodes with fewer visits and the exploitation of promising nodes. The UCB score for the node i is calculated via the following formula:

$$UCB={{{{\rm{arg\,max}}}}}_{i}\left\{Q({s}_{i})+2{C}_{p}\sqrt{\frac{\ln N({s}_{p})}{N({s}_{i})}}\right\}$$

where Q(s) represents the mean estimated value of state s for its child nodes and N(s) denotes the number of visits to state s. The states of each node i and its parent node are represented as s_i and s_p, respectively. C_p balances the trade-off between exploitation and exploration. The first term in the UCB score corresponds to exploitation which encourages the selection of nodes with higher estimated values. The second term corresponds to exploration, which promotes the selection of the less-visited nodes. This selection policy guides the search toward the most valuable regions of the search space, enabling the discovery of the molecules that are likely to have the desired properties.

2.
Expansion: The products generated via virtual chemical reactions using the transformer are added as new child nodes to the selected node. In the conditional transformer, the reaction templates that pass the substructure matching filter after sampling from the GCN are used as conditional tokens. In the unconditional transformer, the model simply predicts the products without considering the reaction templates. A beam search, which is a heuristic search algorithm that explores a graph by expanding the most promising nodes, is employed to obtain the feasible products, and the influence of the beam width on the molecular optimization process is investigated.
3.
Simulation: The values of the child nodes generated in the expansion step are estimated. Virtual chemical reactions are repeatedly performed by the transformer a predetermined number of times. The number of reaction templates sampled during the simulation step is set to 5, and a single molecule is generated for each reaction template. The number of reaction steps explored is set to 2, and the maximum reward value among the generated compounds (25 or fewer) is set as the value for each node.
4.
Backpropagation: The values calculated via the simulation step are propagated for all nodes along the path from the generated nodes to the root node, and the cumulative scores are calculated. The nodes in the direction of the promising compounds contained within the search tree are highly evaluated by this step; this step enables the efficient generation of compounds that are optimized for the evaluation function.

In this study, the quantitative structure-activity relationship (QSAR) model, which predicts the activity probability of a target protein from a chemical structure within the range from 0 to 1, was used as the reward function. The diversity of the generated compounds was evaluated via the following equation, as described in a previous benchmark study of molecular generative models⁸²:

$${{\mbox{IntDiv}}}_{p}(G)=1-\root P \of{{\frac{1}{| G{| }^{2}}{\sum}_{{m}_{1},{m}_{2}\in G}T({m}_{1},{m}_{2})}}$$

where G is the set of valid compounds generated and T is the Tanimoto similarity between compounds m₁ and m₂ calculated by ECFP4 (2048-bit).

QSAR model

The activity prediction models developed for dopamine receptor D2 (DRD2), AKT serine/threonine kinase 1 (AKT1), and C-X-C motif chemokine receptor 4 (CXCR4) were used as the reward functions for the MCTS. A random forest classifier was trained to predict the probability of an active compound on the basis of 2048-bit ECFP6 via active/inactive labeled datasets for each protein described in Section 5.2. The training dataset consisted of 9006 active and 306,457 inactive compounds for DRD2, 3623 active and 14,814 inactive compounds for AKT1, and 853 active and 3051 inactive compounds for CXCR4. The hyperparameters of these models were optimized via Optuna⁷⁷ (Table 10). The performance of the trained model was evaluated by the area under the curve (AUC), which is a performance metric that evaluates the ability of a binary classification model to identify classes across a classification threshold, on a test dataset consisting of 907 active and 33,943 inactive compounds for DRD2, 412 active and 1,636 inactive compounds for AKT1, and 72 active and 355 inactive compounds for CXCR4. The models achieved an accuracy of 0.984 and an AUC of 0.703 for DRD2, an accuracy of 0.958 and an AUC of 0.897 for AKT1, and an accuracy of 0.974 and an AUC of 0.924.

Table 10 The ranges of the hyperparameters included in the random forest and the selected values for each target protein

Full size table

Selection of the starting compounds for the MCTS

The starting materials used by the MCTS for compound generation in Section 2.2 were randomly sampled from the distribution of the QSAR values for DRD2, AKT1, and CXCR4. Molecules with more than 8-membered rings and molecular weights ≥ 300 were removed from the USPTO 1k TPL dataset. Since reactants require functional groups that readily undergo chemical reactions, those not containing halogens, carbonyl groups, unsaturated bonds, or nucleophilic substituents were also removed; this decreased the number of compounds from 890,230 to 374,675. For each protein target, five initial compounds were selected as follows: the QSAR values of the filtered compounds were calculated, and one molecule was randomly sampled from each QSAR value bin including of 0–0.1, 0.1–0.2, 0.2–0.3, 0.3–0.4, and 0.4–0.5. These selected compounds are described in Section 2.2.

Data

USPTO dataset

In this study, the USPTO 1k TPL dataset reported by Schwaller et al. in their reaction classification work⁷³ was used as a chemical reaction dataset. USPTO 1k TPL was derived from the USPTO database created by Lowe⁸³ and consists of 445,000 reactions and corresponding reaction templates. The reaction templates were curated by selecting the 1000 most frequent template hashes obtained through atom mapping and template extraction via RxnMapper⁸⁴. Certain reagents and solvents (e.g., hydrochloric acid, ethyl acetate, and dichloromethane) were removed from the dataset to further preprocess the data. The molecular pairs with the highest Tanimoto similarity between the reactants and products of each reaction were subsequently formed, indicating a structural transformation from the reactants to the products. The index of the labeled reaction template for each data point was appended to the beginning of the SMILES string of the reactants. The source code of TRACER, the activity prediction model, and the curated dataset are available in our public repository at https://github.com/sekijima-lab/TRACER.

Dataset for QSAR modeling

The dataset for the QSAR modeling process targeting DRD2 was extracted from ExCAPE⁸⁵ and ChEMBL⁸⁶. In addition to the active/inactive labeled dataset derived from ExCAPE, a set of compounds from ChEMBL was added as active compounds. On the basis of the literature⁸⁷, the active compounds retrieved from ChEMBL underwent filtration with the following parameters: standard relation equal to “=”, pChEMBL > = 6.0, and molecular weight < 750. The AKT1 and CXCR4, active/inactive labeled datasets acquired from DUD-E⁸⁸ were combined with active compounds from ChEMBL. The active compounds obtained from ChEMBL were preprocessed via the same filtering criteria as those applied to DRD2. The final DRD2 dataset consisted of 9913 active and 340,400 inactive compounds, the AKT1 dataset consisted of 4035 active and 16,450 inactive compounds, and the CXCR4 dataset consisted of 925 active and 3406 inactive compounds.

Dataset and docking simulations for structural optimization starting from unseen molecules

To demonstrate the ability of our framework to generate optimized compounds when starting from unseen molecules, a dataset of molecules not included in the training set was created. This dataset was constructed by removing the compounds used for training the transformer and QSAR models from the building blocks that were available in the ZINC database^89,90. After applying the same filtering process described in Section 4.5, docking simulations were performed via Vina-GPU 2.0⁹¹ with DRD2 (PDB ID: 6CM4), AKT1 (PDB ID: 3CQW), and CXCR4 (PDB ID: 3ODU). The compounds with the highest ligand efficiency (absolute docking score divided by the number of heavy atoms) were then selected as the starting materials for each protein.

Data availability

The activity prediction model and the curated dataset utilized in this study are available in our public repository at https://github.com/sekijima-lab/TRACER.

Code availability

The source code and trained models for TRACER have been released under the MIT license and are available in our public repository at https://github.com/sekijimalab/TRACER.

References

DiMasi, J. A., Grabowski, H. G. & Hansen, R. W. Innovation in the pharmaceutical industry: new estimates of r&d costs. J. Health Econ. 47, 20–33 (2016).
Article PubMed Google Scholar
Wouters, O. J., McKee, M. & Luyten, J. Estimated research and development investment needed to bring a new medicine to market, 2009-2018. JAMA 323, 844 (2020).
Article PubMed PubMed Central Google Scholar
Bleicher, K. H., Böhm, H.-J., Müller, K. & Alanine, A. I. Hit and lead generation: beyond high-throughput screening. Nat. Rev. Drug Discov. 2, 369–378 (2003).
Article CAS PubMed Google Scholar
Jiménez-Luna, J., Grisoni, F. & Schneider, G. Drug discovery with explainable artificial intelligence. Nat. Mach. Intell. 2, 573–584 (2020).
Article Google Scholar
Yang, X., Zhang, J., Yoshizoe, K., Terayama, K. & Tsuda, K. Chemts: an efficient python library for de novo molecular generation. Sci. Technol. Adv. Mater. 18, 972–976 (2017).
Article CAS PubMed PubMed Central Google Scholar
Erikawa, D., Yasuo, N. & Sekijima, M. Mermaid: an open source automated hit-to-lead method based on deep reinforcement learning. J. Cheminform. https://doi.org/10.1186/s13321-021-00572-6 (2021).
Erikawa, D., Yasuo, N., Suzuki, T., Nakamura, S. & Sekijima, M. Gargoyles: An open source graph-based molecular optimization method based on deep reinforcement learning. ACS Omega 8, 37431–37441 (2023).
Article CAS PubMed PubMed Central Google Scholar
Loeffler, H. H. et al. Reinvent 4: Modern ai–driven generative molecule design. J. Cheminformatics 16, 20 (2024).
Article Google Scholar
Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).
Article PubMed PubMed Central Google Scholar
Lim, J., Ryu, S., Kim, J. W. & Kim, W. Y. Molecular generative model based on conditional variational autoencoder for de novo molecular design. J. Cheminform. https://doi.org/10.1186/s13321-018-0286-7 (2018).
Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine learning: Generative models for matter engineering. Science 361, 360–365 (2018).
Article CAS PubMed Google Scholar
Prykhodko, O. et al. A de novo molecular generation method using latent vector based generative adversarial network. J. Cheminform. https://doi.org/10.1186/s13321-019-0397-9 (2019).
Ozawa, M., Nakamura, S., Yasuo, N. & Sekijima, M. Iev2mol: Molecular generative model considering protein–ligand interaction energy vectors. J. Chem. Inf. Modeling 64, 6969–6978 (2024).
Article CAS Google Scholar
Chiba, S. et al. A prospective compound screening contest identified broader inhibitors for sirtuin 1. Sci. Rep. https://doi.org/10.1038/s41598-019-55069-y (2019).
Yoshino, R. et al. Discovery of a hidden trypanosoma cruzi spermidine synthase binding site and inhibitors through in silico, in vitro, and X-ray crystallography. ACS Omega (2023).
Schneider, G. & Clark, D. E. Automated de novo drug design: are we nearly there yet? Angew. Chem. Int. Ed. 58, 10792–10803 (2019).
Article CAS Google Scholar
Ertl, P. & Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminformatics 1, 1–11 (2009).
Article Google Scholar
Chen, S. & Jung, Y. Estimating the synthetic accessibility of molecules with building block and reaction-aware sascore. J. Cheminformatics 16, 83 (2024).
Article Google Scholar
Stanley, M. & Segler, M. Fake it until you make it? generative de novo design and virtual screening of synthesizable molecules. Curr. Opin. Struct. Biol. 82, 102658 (2023).
Article CAS PubMed Google Scholar
de Almeida, A. F., Moreira, R. & Rodrigues, T. Synthetic organic chemistry driven by artificial intelligence. Nat. Rev. Chem. 3, 589–604 (2019).
Article Google Scholar
Strieth-Kalthoff, F. et al. Artificial intelligence for retrosynthetic planning needs both data and expert knowledge. J. Am. Chem. Soc. https://doi.org/10.1021/jacs.4c00338 (2024).
Coley, C. W., Rogers, L., Green, W. H. & Jensen, K. F. Scscore: synthetic complexity learned from a reaction corpus. J. Chem. Inf. Model. 58, 252–261 (2018).
Article CAS PubMed Google Scholar
Voršilák, M., Kolár^, M., Čmelo, I. & Svozil, D. Syba: Bayesian estimation of synthetic accessibility of organic compounds. J. Cheminformatics 12, 1–13 (2020).
Article Google Scholar
Thakkar, A., Chadimová, V., Bjerrum, E. J., Engkvist, O. & Reymond, J.-L. Retrosynthetic accessibility score (rascore)–rapid machine learned synthesizability classification from ai driven retrosynthetic planning. Chem. Sci. 12, 3339–3349 (2021).
Article CAS PubMed PubMed Central Google Scholar
Yu, J. et al. Organic compound synthetic accessibility prediction based on the graph attention mechanism. J. Chem. Inf. Model. 62, 2973–2986 (2022).
Article CAS PubMed Google Scholar
Wang, S., Wang, L., Li, F. & Bai, F. Deepsa: a deep-learning driven predictor of compound synthesis accessibility. J. Cheminformatics 15, 103 (2023).
Article CAS Google Scholar
Skoraczyński, G., Kitlas, M., Miasojedow, B. & Gambin, A. Critical assessment of synthetic accessibility scores in computer-assisted synthesis planning. J. Cheminformatics 15, 6 (2023).
Article Google Scholar
Klingler, F.-M. et al. SAR by space: Enriching hit sets from the chemical space. Molecules 24, 3096 (2019).
Article CAS PubMed PubMed Central Google Scholar
Grygorenko, O. O. et al. Generating multibillion chemical space of readily accessible screening compounds. iScience 23, 101681 (2020).
Article CAS PubMed PubMed Central Google Scholar
Nicolaou, C. A., Watson, I. A., Hu, H. & Wang, J. The proximal lilly collection: Mapping, exploring and exploiting feasible chemical space. J. Chem. Inf. Model. 56, 1253–1266 (2016).
Article CAS PubMed Google Scholar
Hu, Q. et al. Pfizer global virtual library (pgvl): a chemistry design tool powered by experimentally validated parallel synthesis information. ACS Combinat. Sci. 14, 579–589 (2012).
Article CAS Google Scholar
Vinkers, H. M. et al. Synopsis: synthesize and optimize system in silico. J. Med. Chem. 46, 2765–2773 (2003).
Article CAS PubMed Google Scholar
Hartenfeller, M. et al. Dogs: reaction-driven de novo design of bioactive compounds. PLoS Comput. Biol. 8, e1002380 (2012).
Article CAS PubMed PubMed Central Google Scholar
Spiegel, J. O. & Durrant, J. D. Autogrow4: an open-source genetic algorithm for de novo drug design and lead optimization. J. Cheminformatics 12, 1–16 (2020).
Article Google Scholar
Button, A., Merk, D., Hiss, J. A. & Schneider, G. Automated de novo molecular design by hybrid machine intelligence and rule-driven chemical synthesis. Nat. Mach. Intell. 1, 307–315 (2019).
Article Google Scholar
Gao, W., Mercado, R. & Coley, C. W. Amortized tree generation for bottom-up synthesis planning and synthesizable molecular design. In International Conference on Learning Representations https://openreview.net/forum?id=FRxhHdnxt1 (2022).
Nguyen, D. H. & Tsuda, K. Generating reaction trees with cascaded variational autoencoders. J. Chem. Phys. 156, 044117 (2022).
Noh, J. et al. Path-Aware and Structure-Preserving generation of synthetically accessible molecules. In Chaudhuri, K. et al. (eds.) Proceedings of the 39th International Conference on Machine Learning, vol. 162 of Proceedings of Machine Learning Research, 16952-16968 (PMLR, 2022).
Gottipati, S. K. et al. Learning to navigate the synthetically accessible chemical space using reinforcement learning. In International conference on machine learning, 3668-3679 (PMLR, 2020).
Horwood, J. & Noutahi, E. Molecular design in synthetically accessible chemical space via deep reinforcement learning. ACS Omega 5, 32984–32994 (2020).
Article CAS PubMed PubMed Central Google Scholar
Fialková, V. et al. Libinvent: Reaction-based generative scaffold decoration for in silico library design. J. Chem. Inf. Modeling 62, 2046–2063 (2021).
Article Google Scholar
Swanson, K. et al. Generative ai for designing and validating easily synthesizable and structurally novel antibiotics. Nat. Mach. Intell. 6, 338–353 (2024).
Article Google Scholar
Cretu, M., Harris, C., Roy, J., Bengio, E. & Liò, P. Synflownet: Towards molecule design with guaranteed synthesis pathways. In ICLR 2024 Workshop on Generative and Experimental Perspectives for Biomolecular Design (2024).
Wang, Y., Zhang, H., Zhu, J., Li, Y. & Feng, L. Rgfn: Recurrent graph feature network for clickbait detection. In 2021 International Conference on High Performance Big Data and Intelligent Systems (HPBD&IS), 151-156 (IEEE, 2021).
Bradshaw, J., Paige, B., Kusner, M. J., Segler, M. & Hernández-Lobato, J. M. A model to search for synthesizable molecules. Adv. Neural Inf. Process. Syst. 32, 12000–12010 (2019).
Bradshaw, J., Paige, B., Kusner, M. J., Segler, M. & Hernández-Lobato, J. M. Barking up the right tree: an approach to search over molecule synthesis dags. Adv. neural Inf. Process. Syst. 33, 6852–6866 (2020).
Google Scholar
Schwaller, P. et al. Molecular transformer: A model for uncertainty-calibrated chemical reaction prediction. ACS Cent. Sci. 5, 1572–1583 (2019).
Article CAS PubMed PubMed Central Google Scholar
Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 6000–6010 (2017).
Weininger, D. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. J. Chem. Inf. Computer Sci. 28, 31–36 (1988).
Article CAS Google Scholar
Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020).
Google Scholar
Yoshikai, Y., Mizuno, T., Nemoto, S. & Kusuhara, H. Difficulty in chirality recognition for transformer architectures learning chemical structures from string representations. Nat. Commun. 15, 1197 (2024).
Article CAS PubMed PubMed Central Google Scholar
Reaxys. https://www.reaxys.com.
Yi, X. et al. Substituted Oxaborole Antibacterial Agents. WO Patent 2011/17125 A1 (2011).
Sartori, G. & Maggi, R. Use of solid catalysts in friedel- crafts acylation reactions. Chem. Rev. 106, 1077–1104 (2006).
Article CAS PubMed Google Scholar
Quinn, J. F., Bryant, C. E., Golden, K. C. & Gregg, B. T. Rapid reduction of heteroaromatic nitro groups using catalytic transfer hydrogenation with microwave heating. Tetrahedron Lett. 51, 786–789 (2010).
Article CAS Google Scholar
Kianmehr, E. & Afaridoun, H. Nickel (ii)-and silver (i)-catalyzed c–h bond halogenation of anilides and carbamates. synthesis 53, 1513–1523 (2021).
Article CAS Google Scholar
Singh, H., Sen, C., Sahoo, T. & Ghosh, S. C. A visible light-mediated regioselective halogenation of anilides and quinolines by using a heterogeneous cu-mno catalyst. Eur. J. Org. Chem. 2018, 4748–4753 (2018).
Article CAS Google Scholar
Legnani, L. & Morandi, B. Direct catalytic synthesis of unprotected 2-amino-1-phenylethanols from alkenes by using iron (ii) phthalocyanine. Angew. Chem. Int. Ed. 55, 2248–2251 (2016).
Article CAS Google Scholar
Cantello, B. C. et al. [[. omega.-(heterocyclylamino) alkoxy] benzyl]-2, 4-thiazolidinediones as potent antihyperglycemic agents. J. Med. Chem. 37, 3977–3985 (1994).
Fan, Y.-H. et al. Structure–activity requirements for the antiproliferative effect of troglitazone derivatives mediated by depletion of intracellular calcium. Bioorg. Med. Chem. Lett. 14, 2547–2550 (2004).
Article CAS PubMed Google Scholar
Bemis, G. W. & Murcko, M. A. The properties of known drugs. 1. molecular frameworks. J. Med. Chem. 39, 2887–2893 (1996).
Article CAS PubMed Google Scholar
Chen, B., Li, C., Dai, H. & Song, L. Retro*: learning retrosynthetic planning with neural guided a* search. In International conference on machine learning, 1608–1616 (PMLR, 2020).
Enamine building blocks. https://enamine.net/building-blocks/building-blocks-catalog. Accessed on 2024/11/05.
Preuer, K., Renz, P., Unterthiner, T., Hochreiter, S. & Klambauer, G. Fréchet chemnet distance: a metric for generative models for molecules in drug discovery. J. Chem. Inf. Model. 58, 1736–1741 (2018).
Article CAS PubMed Google Scholar
Gao, H. et al. Using machine learning to predict suitable conditions for organic reactions. ACS Cent. Sci. 4, 1465–1476 (2018).
Article CAS PubMed PubMed Central Google Scholar
Shim, E. et al. Predicting reaction conditions from limited data through active transfer learning. Chem. Sci. 13, 6655–6668 (2022).
Article CAS PubMed PubMed Central Google Scholar
Fu, T., Xiao, C., Li, X., Glass, L. M. & Sun, J. Mimosa: Multi-constraint molecule sampling for molecule optimization. In Proceedings of the AAAI Conference on Artificial Intelligence, 35, 125–133 (2021).
Pang, C., Qiao, J., Zeng, X., Zou, Q. & Wei, L. Deep generative models in de novo drug molecule generation. J. Chem. Inf. Modeling 64, 2174–2194 (2023).
Article Google Scholar
Yang, X. et al. Gpmo: Gradient perturbation-based contrastive learning for molecule optimization. In IJCAI, 4940-4948 (2023).
Yang, Y. et al. SyntaLinker: automatic fragment linking with deep conditional transformer neural networks. Chem. Sci. 11, 8312–8322 (2020).
Article CAS PubMed PubMed Central Google Scholar
Wang, J. et al. Multi-constraint molecular generation based on conditional transformer, knowledge distillation and reinforcement learning. Nat. Mach. Intell. 3, 914–922 (2021).
Article Google Scholar
He, J. et al. Transformer-based molecular optimization beyond matched molecular pairs. J. Cheminformatics. https://doi.org/10.1186/s13321-022-00599-3 (2022).
Schwaller, P. et al. Mapping the space of chemical reactions using attention-based neural networks. Nat. Mach. Intell. 3, 144–152 (2021).
Article Google Scholar
Paszke, A. et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Infor. Process. Syst. 32, 8024–8035 (2019).
Kipf, T. N. & Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the 5th International Conference on Learning Representations, ICLR ’17 https://openreview.net/forum?id=SJU4ayYgl (2017).
Ishida, S., Terayama, K., Kojima, R., Takasu, K. & Okuno, Y. Prediction and interpretable visualization of retrosynthetic reactions using graph convolutional networks. J. Chem. Inf. modeling 59, 5026–5033 (2019).
Article CAS Google Scholar
Akiba, T., Sano, S., Yanase, T., Ohta, T. & Koyama, M. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, 2623-2631 (2019).
Browne, C. B. et al. A survey of monte carlo tree search methods. IEEE Trans. Comput. Intell. AI games 4, 1–43 (2012).
Article Google Scholar
Jensen, J. H. A graph-based genetic algorithm and generative model/monte carlo tree search for the exploration of chemical space. Chem. Sci. 10, 3567–3572 (2019).
Article CAS PubMed PubMed Central Google Scholar
Sun, M. et al. Molsearch: search-based multi-objective molecular generation and property optimization. In Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining, 4724-4732 (2022).
Suzuki, T., Ma, D., Yasuo, N. & Sekijima, M. Mothra: Multiobjective de novo molecular generation using monte carlo tree search. J. Chem. Inf. Model. 64, 7291–7302 (2024).
Article CAS PubMed PubMed Central Google Scholar
Polykovskiy, D. et al. Molecular sets (MOSES): A benchmarking platform for molecular generation models. Front. Pharmacol. 11, 565644 (2020).
Article CAS PubMed PubMed Central Google Scholar
Lowe, D. Chemical reactions from u.s. patents (1976-sep2016), 2017.https://figshare.com/articles/dataset/Chemical_reactions_from_US_patents_1976-Sep2016_5104873.
Schwaller, P., Hoover, B., Reymond, J.-L., Strobelt, H. & Laino, T. Extraction of organic chemistry grammar from unsupervised learning of chemical reactions. Sci. Adv. 7, eabe4166 (2021).
Article PubMed PubMed Central Google Scholar
Sun, J. et al. Excape-db: an integrated large scale dataset facilitating big data analysis in chemogenomics. J. Cheminformatics 9, 1–9 (2017).
Google Scholar
Zdrazil, B. et al. The chembl database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods. Nucleic acids Res. 52, D1180–D1192 (2024).
Article CAS PubMed Google Scholar
Papadopoulos, K., Giblin, K. A., Janet, J. P., Patronov, A. & Engkvist, O. De novo design with deep generative models based on 3d similarity scoring. Bioorg. Medicinal Chem. 44, 116308 (2021).
Article CAS Google Scholar
Mysinger, M. M., Carchia, M., Irwin, J. J. & Shoichet, B. K. Directory of useful decoys, enhanced (dud-e): better ligands and decoys for better benchmarking. J. Medicinal Chem. 55, 6582–6594 (2012).
Article CAS Google Scholar
Irwin, J. J. et al. Zinc20-a free ultralarge-scale chemical database for ligand discovery. J. Chem. Inf. Model. 60, 6065–6073 (2020).
Article CAS PubMed PubMed Central Google Scholar
Zinc database. https://files.docking.org/bb/.
Ding, J. et al. Vina-gpu 2.0: further accelerating autodock vina and its derivatives with graphics processing units. J. Chem. Inf. Model. 63, 1982–1998 (2023).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This work was partially supported by a Grant-in-Aid for Transformative Research Areas (A) “Latent Chemical Space” JP24H01760 for M.S. from the Ministry of Education, Culture, Sports, Science and Technology, Japan; the Research Support Project for Life Science and Drug Discovery (Basis for Supporting Innovative Drug Discovery and Life Science Research (BINDS)) from AMED under Grant Number JP24ama121026 for M.S.; the Japan Society for the Promotion of Science (JSPS) under KAKENHI Grant Number JP20H00620 for M.S.; and JST SPRING Grant Number JPMJSP2106 for S.N. We would like to thank Springer Nature Author Services for their professional English language editing (Verification code: 9754-750B-ED41-C1BD-E29P).

Author information

Authors and Affiliations

Department of Life Science and Technology, Institute of Science Tokyo, 4259–J3–23, Nagatsuta-cho, Midori-ku, Yokohama, 226-8501, Kanagawa, Japan
Shogo Nakamura
Academy for Convergence of Materials and Informatics (TAC-MI), Institute of Science Tokyo, S6-23, Ookayama, Meguro-ku, 152-8550, Tokyo, Japan
Nobuaki Yasuo
Department Computer Science, Institute of Science Tokyo, 4259–J3–23, Nagatsuta-cho, Midori-ku, Yokohama, 226-8501, Kanagawa, Japan
Masakazu Sekijima

Authors

Shogo Nakamura
View author publications
Search author on:PubMed Google Scholar
Nobuaki Yasuo
View author publications
Search author on:PubMed Google Scholar
Masakazu Sekijima
View author publications
Search author on:PubMed Google Scholar

Contributions

S.N. designed the research and M.S. supervised the computational study. S.N. developed the software, and N.Y. guided the experiments. S.N. wrote the manuscript. All authors critically reviewed and revised the manuscript draft and approved the final version for submission.

Corresponding author

Correspondence to Masakazu Sekijima.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Communications Chemistry thanks Fang Bai and the other, anonymous, reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Nakamura, S., Yasuo, N. & Sekijima, M. Molecular optimization using a conditional transformer for reaction-aware compound exploration with reinforcement learning. Commun Chem 8, 40 (2025). https://doi.org/10.1038/s42004-025-01437-x

Download citation

Received: 26 August 2024
Accepted: 28 January 2025
Published: 08 February 2025
Version of record: 08 February 2025
DOI: https://doi.org/10.1038/s42004-025-01437-x

This article is cited by

MolMod: a molecular modification platform for molecular property optimization via fragment-based generation
- Yao Zhou
- Zhipei Sang
- Shuheng Huang
Molecular Diversity (2025)

Subjects

Abstract

Similar content being viewed by others

Bridging the gap between chemical reaction pretraining and conditional molecule generation with a unified model

Targeted molecular generation with latent reinforcement learning

High-throughput property-driven generative design of functional organic molecules

Introduction

Results and discussion

Effect of the conditional token on chemical reactions

Optimization of compounds involving the generation of synthetic routes targeting specific proteins via a QSAR model

Reaction pathways generated by the model

Comparative analysis with molecular generation models that incorporate synthetic route generation

Molecule chef

DoG-Gen

CasVAE

SynFlowNet

Experimental results

Structural optimization experiments involving the QSAR model starting from unseen molecules

Conclusions

Methods

Conditional transformer model

Graph convolutional network (GCN)

Graph convolution layer

Dense layer

Aggregation layer

Monte Carlo Tree Search (MCTS)

QSAR model

Selection of the starting compounds for the MCTS

Data

USPTO dataset

Dataset for QSAR modeling

Dataset and docking simulations for structural optimization starting from unseen molecules

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

MolMod: a molecular modification platform for molecular property optimization via fragment-based generation

Search

Quick links