A data-driven generative strategy to avoid reward hacking in multi-objective molecular design

Yoshizawa, Tatsuya; Ishida, Shoichi; Sato, Tomohiro; Ohta, Masateru; Honma, Teruki; Terayama, Kei

doi:10.1038/s41467-025-57582-3

Download PDF

Article
Open access
Published: 11 March 2025

A data-driven generative strategy to avoid reward hacking in multi-objective molecular design

Nature Communications volume 16, Article number: 2409 (2025) Cite this article

8778 Accesses
4 Citations
62 Altmetric
Metrics details

Subjects

Abstract

Molecular design using data-driven generative models has emerged as a promising technology, impacting various fields such as drug discovery and the development of functional materials. However, this approach is often susceptible to optimization failure due to reward hacking, where prediction models fail to extrapolate, i.e., fail to accurately predict properties for designed molecules that considerably deviate from the training data. While methods for estimating prediction reliability, such as the applicability domain (AD), have been used for mitigating reward hacking, multi-objective optimization makes it challenging. The difficulty arises from the need to determine in advance whether the multiple ADs with some reliability levels overlap in chemical space, and to appropriately adjust the reliability levels for each property prediction. Herein, we propose a reliable design framework to perform multi-objective optimization using generative models while preventing reward hacking. To demonstrate the effectiveness of the proposed framework, we designed candidates for anticancer drugs as a typical example of multi-objective optimization. We successfully designed molecules with high predicted values and reliabilities, including an approved drug. In addition, the reliability levels can be automatically adjusted according to the property prioritization specified by the user without any detailed settings.

Targeted molecular generation with latent reinforcement learning

Article Open access 30 April 2025

Molecular optimization using a conditional transformer for reaction-aware compound exploration with reinforcement learning

Article Open access 08 February 2025

A systematic study of key elements underlying molecular property prediction

Article Open access 13 October 2023

Introduction

Molecular design using generative models has recently gained significant attention as a promising technology^1,2,3. This approach has been applied to diverse fields, such as drug discovery^{4,5,6,7,8,9,10} and the development of functional materials^{11,12,13,14,15,16,17,18,19}. Such a design can be formulated as an inverse problem¹, i.e., designing molecules with desired properties. The key is to evaluate the designed molecules (calculate the reward function) using methods like molecular simulation or data-driven predictive modeling, to guide the design process. While simulation-based evaluation, grounded in physicochemical principles, is generally accurate, it can be computationally expensive. Therefore, data-driven evaluation, based on supervised learning prediction models, is frequently used². In addition, practical molecular design often requires the simultaneous optimization of multiple properties. This can be formulated as a multi-objective optimization problem²⁰, and molecular designs based on generative models and data-driven evaluations have been reported using this approach^4,5,6,7,9,10.

However, data-driven molecular design using prediction models faces the risk of reward hacking²¹. This phenomenon, well-known in reinforcement learning and game AI, occurs when optimization deviates unexpectedly from the intended goals^22,23. It arises when a reward function, used to guide optimization, produces unintended outputs due to inputs that deviate significantly from expected scenarios. A similar issue can arise in molecular design guided by a prediction model, leading to the design of molecules that diverge from the model’s training data, resulting in reward hacking^21,24,25. Consequently, the optimization process deviates from its intended direction. In such cases, the seemingly favorable predicted value of the designed molecule may be inaccurate. In fact, in drug design, there have been cases where unstable or complex molecules, distinct from existing drugs, have been designed due to reward hacking^24,26.

To address the issue of reward hacking, strategies have been developed to assess and improve the reliability of prediction models. The reliability of these models has long been formulated as applicability domains (ADs)^{27,28,29,30,31,32,33} and has recently gained attention in the context of prediction uncertainty^{34,35,36,37,38,39,40}. The AD is defined as “the response and chemical structure space in which the model makes predictions with a given reliability”²⁷, and predictions within the AD are expected to maintain a certain level of accuracy. The trade-off between the size of an AD and the reliability of a prediction can be adjusted by changing the reliability level (Fig. 1a). A common approach to ensuring reliability is to guide the design of molecules within the AD of a single prediction model^41,42. Additionally, a strategy for performing multi-objective optimization by considering prediction reliability has been reported, where multiple ADs are merged into a single AD (Fig. 1b), and molecular designs are then performed within this merged AD⁹.

**Fig. 1: The concept of applicability domain (AD) and difficulties of multi-objective optimization incorporating multiple ADs.**

However, even when focusing on prediction reliability to mitigate reward hacking, maintaining reliability in multi-objective optimization is not straightforward. First, the strategy of merging multiple ADs is undesirable except in cases where multiple prediction models are trained on the same dataset. As prediction models have unique ADs depending on their training data, each AD should be considered separately when multiple prediction models are employed. In cases where multiple ADs are handled separately, as shown in the leftmost case of Fig. 1c, when ADs are defined for each prediction model at high reliability levels, they may not overlap if the training data for each model are distant in chemical space. In such situations, it is inherently difficult to generate molecules that fall within all ADs. As a compromise, it may be possible to overlap multiple ADs by lowering the reliability levels (center and right cases in Fig. 1c). However, excessively low reliability levels may produce unreliable molecules, and thus, higher reliability levels are desirable. A further complication is the difficulty in determining the appropriate reliability level before performing the molecular design (Fig. 1d). De novo molecular design may be successful even if no molecules have been reported in the potential overlap of the ADs (upper right of Fig. 1d). However, there is no guarantee that the generative model will always succeed in designing molecules within the potential overlap (middle right of Fig. 1d). If the reliability levels are too strict (bottom right of Fig. 1d), the molecular design will be difficult, but whether it is actually impossible cannot be determined until the design process is executed.

To overcome this difficulty, we developed an optimization framework called DyRAMO (Dynamic Reliability Adjustment for Multi-objective Optimization), which performs multi-objective optimization while maintaining the reliability of multiple prediction models. DyRAMO was designed to adjust the appropriate reliability level for each objective, achieving a balance between high prediction reliability and predicted properties of the designed molecules. In DyRAMO, the reliability levels were explored through repeated molecular designs. To make this exploration efficient, Bayesian optimization (BO)⁴³ was integrated into this approach. To demonstrate the efficacy of DyRAMO, we designed epidermal growth factor receptor (EGFR) inhibitors while maintaining high reliability for three properties: inhibitory activity against EGFR, metabolic stability, and membrane permeability. Appropriate reliability levels were efficiently explored using BO, and promising molecules, including known inhibitors, were successfully designed. In addition, we face a situation in which the reliability needs to be varied for each property prediction in a practical multi-objective optimization. We reported that such a prioritization is also possible by adjusting the DyRAMO settings. These results demonstrate the effectiveness of the data-driven generative strategy using DyRAMO. The DyRAMO is available at https://github.com/ycu-iil/DyRAMO.

Results

Construction of DyRAMO

Figure 2 shows the DyRAMO framework for achieving multi-objective optimization while maintaining the prediction reliability as much as possible. Reliability levels are explored through an iteration of the following three steps: setting the reliability level of each property prediction, designing molecules, and evaluating the results of molecular design according to their predicted properties and their reliability levels (Fig. 2). Details of each step are described below.

**Fig. 2: Workflow of DyRAMO (Dynamic Reliability Adjustment for Multi-objective Optimization).**

In Step 1, a reliability level ρ_i is set for each target property i, and the ADs of prediction models for the properties are defined based on the set reliability levels. To define AD in this study, we adopted the maximum value of Tanimoto similarity (MTS) for the training data, which is very simple but one of the most commonly used methods^9,44,45. At a reliability level ρ, a molecule is included in the AD of a prediction model if the highest value of Tanimoto similarities between the molecule and the ones in the training data of the model exceeds ρ. The spread size of an AD varies with the reliability level ρ.

In Step 2, the molecules are designed using a generative model to enter the overlapping region of the ADs given in Step 1 and perform multi-objective optimization. In this study, we employed ChemTSv2⁴⁶, which has proven performance in various molecular designs ranging from photo-functional materials¹¹ to drug design^8,10,47, to generate molecules using a recurrent neural network (RNN)⁴⁸ and a Monte Carlo tree search (MCTS)⁴⁹. Details of the ChemTSv2, and configuration setup for execution are described in the Methods section. The properties of the designed molecules were evaluated using prediction models based on supervised learning, and the details of the models used in this study are presented in the Methods section. The strategy for performing multi-objective optimization within the ADs is described in the next section. If the settings in Step 1 are not appropriate, the overlap of the prepared ADs may not be sufficiently wide, or the optimization of the predicted properties may fail.

In Step 3, the molecular design is evaluated for two aspects: whether the design is highly reliable and whether the designed molecules are sufficiently optimized for the properties, using the DSS (Degree of Simultaneous Satisfaction of prediction reliability and multiple property optimizations) score defined below:

$${{{\rm{DSS}}}}={\left({\prod }_{i=1}^{n}{{{{\rm{Scaler}}}}}_{i}({\rho }_{i})\right)}^{\frac{1}{n}}\times {{{{\rm{Reward}}}}}_{{{{\rm{top}}}}X\%}$$

(1)

Here, for each ith property, Scaler_i is a scaling function that standardizes the reliability level ρ_i to a value between 0 and 1 depending on its desirability. The scaling function parameters can also be adjusted when prioritization is desired among the properties to be optimized. Reward_topX% is the average of the top X reward values for the designed molecules, which indicates how well the multi-objective optimization was achieved. In this study, the top 10% of the reward values for the designed molecules were considered. It should be noted that this framework is constructed to work well with other definitions of ADs and uncertainties of prediction models, as long as any parameter of reliability level is available.

Exploring all the combinations of reliability levels is time-consuming because it requires repetitive molecular design until an appropriate combination is found. BO was introduced to make this exploration efficient. This technique allows the efficient exploration of input variables to maximize (or minimize) a predefined objective function. Here, the search space is a combination of possible reliability levels of the target properties, and the objective variable is the DSS score.

Strategy of molecular design incorporating ADs

In Step 2 of DyRAMO, molecular design aiming for multi-objective optimization with reliable predictions is performed using ChemTSv2. The design process in ChemTSv2 focuses on maximizing a predefined reward function. To optimize multiple predicted properties within the ADs of multiple prediction models, we define the reward as follows:

$${{{\rm{Reward}}}}=\left\{\begin{array}{ll}{\left({\prod }_{i=1}^{n}{v}_{i}^{{w}_{i}}\right)}^{\frac{1}{\mathop{\sum }_{i=1}^{n}{w}_{i}}}\quad &\,{\mbox{if}}\,{s}_{i}\ge \,{\rho }_{i}\, {\mbox{for all}}\,i=1,2,\ldots,n\\ 0 \quad\quad\quad\quad\quad\quad&\,{\mbox{otherwise}}\,\hfill\end{array}\right.$$

(2)

where s_i represents the MTS between the designed molecule and the training data for each prediction model for ith property, ρ_i is a reliability level, v_i is the desirability of the predicted property, w_i is the weight, and n is the number of properties to be optimized. As a molecular descriptor for computing Tanimoto similarity, Morgan fingerprint^50,51, whose radius and dimension were 2 and 2048, respectively, was applied. To calculate the desirability of the predicted properties, Gaussian scaling functions were defined for each property according to Brown et al.⁵² (Fig. S2). Weight w is allocated to each predicted property for prioritization during optimization. When calculating the reward function, the evaluation of the generated molecules was changed depending on whether the designed molecule was within the ADs of the property prediction models. If the MTS between the designed molecule and training data exceeds the reliability level defined for all target property predictions, the designed molecule is judged to be within the ADs. In this case, as a reward, the Dscore⁵³, which is the weighted geometric mean of the desirability of each property and is often used in the multi-objective optimization of molecular properties^5,10,54, is calculated. The Dscore is designed to optimize multiple objectives in a balanced manner as the score improves, without neglecting any specific objective. This balanced approach is particularly valuable in drug discovery, where achieving a reasonable degree of optimization for all objectives is critical. If the MTS is below the reliability level for any objective, it is judged to be outside the ADs, and the reward value is set to 0. By designing the reward function in this manner, we aim to achieve multi-objective optimization of the molecular predicted properties within the ADs of multiple prediction models.

Multi-objective optimization avoiding reward hacking in drug design

To verify the performance of DyRAMO, we conducted molecular design in the context of drug design. As a drug target, EGFR, which is targeted in the development of anticancer drugs⁵⁵, was selected. Drug design requires the optimization of pharmacokinetic properties in addition to activity against drug targets, as these properties are essential for drug efficacy and safety⁵⁶. Here, we considered two pharmacokinetic properties and addressed three: inhibitory activity against EGFR, metabolic stability, and membrane permeability. In adjusting the reliability levels, each property can be weighted in the DSS score. The reliability level of each property prediction was standardized by Scaler in the calculation of the DSS score. The Scaler has an adjustable parameter σ (Fig. S1), which is used to weight each property in adjusting reliability levels. Reducing the σ makes the Scaler stricter and more sensitive to fluctuations in reliability levels. The σ used in this study was three patterns: 0.15 (i) for tight, 0.25 (ii) for normal, and 0.35 (iii) for loose. The σ for each property here was set to (i) for inhibitory activity, and (ii) for metabolic stability and membrane permeability. Details of the development of prediction models for these properties, the molecular design using ChemTSv2, and the search for appropriate reliability levels using DyRAMO are described in the Methods section. To verify the effect of the molecular design on the prediction reliability, the same molecular design without considering the reliability was performed. Furthermore, to consider that desirable molecules exist in the training data of the property prediction models, we removed the approved EGFR inhibitors from the training data and then performed molecular design using this modified dataset.

Figure 3 shows the results of the molecular designs with reliability levels adjusted by DyRAMO (Fig. 3a, c) and without considering the prediction reliability (Fig. 3b, d). The optimized reliability levels were 0.66 for inhibitory activity, 0.55 for metabolic stability, and 0.43 for membrane permeability, and the exploration process of the reliability levels is discussed in the next paragraph, and Fig. 4. Figure 3a shows the optimization process for the predicted properties and reliability of the molecules designed using ChemTSv2 with the adjusted reliability levels. The MTS between a designed molecule and the training data for each property was evaluated as the reliability of the designed molecule. The series in Fig. 3a and b show the moving averages of each of the 200 designed molecules. As the design progressed, the predicted values of the three properties, particularly the inhibitory activity, increased. The reliabilities of the designed molecules exceeded the set reliability levels (dotted lines) during molecular design, indicating that they were included in the adjusted AD for each model. Figures 3c and S3 show examples of the designed molecules along with the predicted property values and MTSs of the training data. These molecules were predicted to have an inhibitory activity against EGFR (pIC₅₀) of over 8, with the other two predicted properties also generally well optimized. From the viewpoint of drug design, these molecules have a quinazoline substructure (highlighted in Fig. 3c), a characteristic substructure of known EGFR inhibitors⁵⁷, suggesting that the essential feature of EGFR inhibitors is captured during the molecular design process. However, as shown in Fig. 3b and d, multi-objective optimization without considering prediction reliability resulted in reward hacking. Although the predicted properties were optimized, the designed molecules differed significantly from the training data. The designed molecules shown in Fig. 3d are far from known EGFR inhibitors. These results indicate that with the adjustment of the reliability levels by DyRAMO, a molecular design can be effectively achieved.

**Fig. 3: Result of the molecular designs using DyRAMO.**

**Fig. 4: The processes of adjusting reliability levels by Bayesian optimization (BO) and random exploration in DyRAMO (Dynamic Reliability Adjustment for Multi-objective Optimization).**

In addition, even when approved drugs were removed from the training data of the property prediction models, DyRAMO designed molecules with desirable predicted properties. Among the generated molecules, gefitinib, one of the approved drugs that had been removed from the training data of the property prediction models, was included (Fig. 3e). The number of molecules satisfying the reliability thresholds set by DyRAMO (i.e., classified as inside the ADs) was counted for both the newly generated molecules and the training molecules of the property prediction models. This analysis revealed that DyRAMO generated a sufficient number of novel molecules (Fig. S6, Table S1). Furthermore, examining the distribution of these molecules in chemical space demonstrated that the generated molecules not only overlapped with the space occupied by training molecules with desirable predicted properties but also extended into previously unexplored space (Fig. S7). These results highlight DyRAMO’s capability to design novel and promising molecules with desirable predicted properties.

The exploration process for the optimal reliability levels using DyRAMO is shown in Fig. 4. For comparison, the results of randomly adjusting the reliability levels are also presented. Fig. 4a shows the evolution of the DSS score during BO-based and random exploration. In BO-based exploration, the DSS score increased as exploration progressed, eventually reaching approximately 0.38. In contrast, during random exploration, the DSS score fluctuated between 0.0 and 0.2 throughout. This demonstrates that the BO efficiently identified a combination of appropriate reliability levels that yielded a high DSS score. Figure 4b and c represent the evolution of the explored reliability levels and the scaled predicted properties of the designed molecules, respectively. Each scaled predicted property in each exploration step is calculated by extracting the designed molecules with the top 10% average of the three scaled properties from all designed molecules, and then averaging each scaled property for these extracted molecules. The explored reliability levels stabilized late in the exploration process and converged at approximately 0.60 for inhibitory activity against EGFR, and 0.40 for metabolic stability and membrane permeability, respectively. The predicted properties were optimized as exploration progressed. Figure 4d illustrates how DyRAMO balances the desirability of the reliability levels and predicted properties. At the beginning of the exploration, either the average reliabilities or scaled properties are high, or both are low. However, as the exploration progresses, it can be seen that the exploration moves toward the upper right in an attempt to optimize both the averaged reliabilities and scaled properties. Figure 4e shows that it is difficult to conduct efficient random exploration. These results demonstrated the efficiency of BO-based exploration in DyRAMO for designing molecules while avoiding reward hacking.

Prioritizing properties in reliability adjustment

In practical multi-objective optimization situations, the relative importance of each property may differ, and it is desirable to have high prediction reliability for properties of high importance, whereas for properties of lesser importance, it may be acceptable to compromise reliability. Therefore, we tested the feasibility of adjusting the reliability level of each property prediction by weighting it using the DyRAMO. This investigation was conducted under four scenarios: 1. without prioritization (no-priority) and 2. with the highest priority given to inhibitory activity against EGFR (inhibitory activity prioritized); 3. metabolic stability (metabolic stability prioritized); and 4. membrane permeability (permeability prioritized). The weighting was provided by the value of the σ of the Scaler in the DSS score. The σ for each property was set to (i) for properties given the highest priority and (iii) for others. In pattern 1 (no-priority), (iii) was set for all properties. Except for the weighting, the reliability levels were explored under the same conditions as those in the exploration conducted by DyRAMO in Section Multi-objective optimization avoiding reward hacking in drug design.

Figure 5a shows that under the four scenarios with different reliability priorities, the molecules were optimized as intended. The vertical axis in Fig. 5a shows the reliability levels explored using the DyRAMO. In pattern 2 (inhibitory activity prioritized), the average of reliability levels for the inhibitory activity was higher than that in the no-priority pattern, approximately 0.63 and 0.53, respectively. Similar increases in the explored reliability levels were observed for patterns 3 (metabolic stability prioritized) and 4 (permeability prioritized). Figure 5b–d shows examples of molecules designed with patterns 3, 4, and 5. The similarity between the designed and training molecules for the prediction model was relatively high for the prioritized properties. In particular, when priority was given to metabolic stability or membrane permeability (patterns 3 and 4), whose optimized reliability levels were relatively low in pattern 1, slightly different molecules were designed, as shown in Fig. 5c and d. These results indicate that DyRAMO can adjust the reliability, reflecting priorities among the target properties.

**Fig. 5: Effects of prioritization on adjustment of reliability levels.**

Molecular design in a situation where the overlap of ADs is not expected

In the context of multi-objective optimization using prediction models, it is assumed that the training molecules of each prediction model do not overlap or are not similar. In the molecular design examined in Section Multi-objective optimization avoiding reward hacking in drug design, several molecules, spanned multiple training datasets, leading to a situation in which the overlap of the ADs for each property was relatively large. Therefore, molecules likely to span multiple ADs were removed from the training data of the prediction model. Specifically, for data with MTS greater than 0.5, across the three datasets, a total of 777 molecules were removed (Fig. 6a). The prediction models were reconstructed using the partially removed dataset, and the reliability levels were explored using DyRAMO under the same conditions described in Section Multi-objective optimization avoiding reward hacking in drug design. ChemTSv2 with adjusted reliability levels successfully reproduced known EGFR inhibitors that were excluded from the training dataset (Fig. 6b). These molecules have high experimental pIC₅₀ values against EGFR (approximately 7-8), with generally optimized predicted metabolic stability and membrane permeability. This suggests that DyRAMO has the potential to identify drug candidates even when the molecules in each training dataset are not similar.

**Fig. 6: Removal of similar molecules across datasets and rediscovered examples in the removed molecules.**

Discussions

Reward hacking is a common phenomenon and an unavoidable issue as long as designing a perfect reward before optimization is difficult²³. As has been pointed out in the past²¹, reward hacking, in which multi-objective optimization seems to succeed, but the molecule is actually unpromising, occurred in this study, as shown in Fig. 3b, d. DyRAMO was designed with the intention of generating molecules within an overlapping reliable region by introducing the idea of adjustable AD for each property to avoid such reward hacking. As shown in Fig. 3, by appropriately adjusting the reliability levels, promising molecules, including known inhibitors, were designed by appropriately adjusting their reliability levels. In addition, because the exploration of an appropriate combination of reliability levels can be daunting, we formulated the reliability level exploration as a black-box optimization of a function that computes the DSS score that balances both the reliability and multi-objective optimization of the predicted properties. This allowed us to introduce an efficient exploration method using BO, and the optimal reliability levels were found quickly, as shown in Fig. 4. Furthermore, as described in Section Prioritizing properties in reliability adjustment, properties can be prioritized by assigning appropriate weights to the DSS score. This study introduces a solution to the problem of multi-objective optimization using generative models, which is likely to become even more important in the future.

Although DyRAMO is a general framework for avoiding reward hacking in multi-objective optimization, there is still room for improvement, even when focusing on molecular design. First, it is necessary to investigate how to establish an appropriate definition of the reliability or confidence of a prediction model. Although this study adopted a simple and basic method of defining AD, MTS from the training data, various other definitions of AD have been proposed. For instance, in addition to distance-based methods, such as Tanimoto similarity, range-based and probability density distribution-based methods⁵⁸. The molecular representations for these calculations include fingerprints, property-based vector representations, and graph representations⁵⁹. The optimal approach for assessing prediction reliability differs for each dataset⁵⁸, making the optimization of reliability indicators crucial. DyRAMO is applicable to methods for defining ADs other than MTS and has also been confirmed to work effectively when using a 3D-based similarity metric to define ADs (Figs. S10, S11). Determining which AD indicators are most suitable for molecular design remains an issue that should be further investigated in the future. In addition, activity cliffs (ACs), a phenomenon where structurally similar molecules exhibit significantly different activities⁶⁰, require particular caution, especially defining ADs based on structure similarity-based metrics, such as MTS. In situations where ACs occur, the assumption that structural similarity correlates with higher prediction reliability may collapse, and the defined AD may not work as intended. A tentative approach to address this issue is to include occurrence of ACs as part of the criteria for defining ADs. Significant efforts have been made to predict whether a given pair of molecules may exhibit ACs^{61,62,63,64,65,66}, and these methods could potentially be utilized to address the issue. A simple approach to handling ACs using these methods would be to classify molecule pairs predicted to cause ACs as outside ADs. Although it is currently challenging to completely avoid prediction errors caused by ACs using machine learning-based AC detection methods⁶⁷, addressing these issues remains an important area for future investigation.

In addition, the design of the reward, that is, the evaluation function for molecular design and the DSS score, may also have areas for advancement. Careful setting of the evaluation function in molecular design is required because it directly affects the molecule to be designed; therefore, a suitable design for DyRAMO should be explored. Although multi-objective optimization was formulated using the Dscore⁵³ in this study, this serves merely as one example, other optimization strategies using weighted sums^7,54 or Pareto solutions^7,68,69 may also be promising. In this study, most of the reliability levels were low when the number of target property variables was high, as shown in Fig. S12 and Table S2. This reflects the difficulty of the problem itself, but it could be improved by refining the molecular design model, careful setting of the molecular design, and modifying the DSS scores.

Finally, to overcome the limitations of molecular designs that depend on training data, as shown in this study, the use of methods that do not rely on currently available data is essential. As specific strategies, adding new data or predicting properties through molecular simulations based on physicochemical backgrounds can be considered. Although adding new data is costly, it is expected to expand the AD, thereby enabling reliable molecular design in a broader chemical space. Incorporating the estimation of physical properties by quantum chemistry, all-atom, coarse-grained, or docking simulations into molecular design is also a promising strategy. These methods are based on physicochemical principles and thus could mitigate risks associated with prediction reliability and activity cliffs, which are inherent in data-driven property estimation. Indeed, experimental validation of functional molecules designed by quantum chemistry calculations has been reported¹⁸. While it is challenging to simulate all properties, including permeability and pharmacokinetics, as addressed in this study, utilizing the available simulation techniques to verify and design molecules with low reliability will be necessary in the future.

Methods

Molecule generator

As a molecule generator, we employed ChemTSv2⁴⁶, which has been applied in materials^11,13,18,19 and drug design^8,10,47 and has been experimentally validated to have the ability to design desired molecules. ChemTSv2 combines two main components: molecular design with an RNN and chemical space exploration via MCTS. In MCTS, each node represents an atom in the molecule’s Simplified Molecular Input Line Entry System (SMILES)⁷⁰ string, exploring molecules by extending nodes to build a search tree. The exploration process using MCTS involves four steps: selection, expansion, simulation, and backpropagation. In the selection step, a leaf node is selected based on the upper confidence bound (UCB) score. The UCB score balances exploration and exploitation by adjusting the parameter C. For instance, a high C value (e.g., 1.0) emphasizes exploitation, focusing on unvisited nodes, whereas a low value (e.g., 0.1) encourages the exploration of promising nodes. In the expansion step, the selected node is expanded based on the RNN. In the simulation step, a molecule is generated by completing the SMILES part with the RNN, and is evaluated using a reward function. Finally, backpropagation updates the UCB score of the evaluated node based on the reward function. ChemTSv2 iterates these four steps to design molecules that maximize the value of the reward function. The RNN model was trained using 224,153 molecules from the ChEMBL database. These molecules were represented as randomized SMILES⁷¹ to explore a wide range of chemical space.

Construction and evaluation methods of the property prediction models

The construction and evaluation methods of the prediction models used to evaluate the designed molecules are described below. In this study, we tried two patterns for several target properties: 3 properties and 13 properties. When the 3 properties pattern, as target properties, the following three properties were selected: inhibitory activity against EGFR, metabolic stability, and membrane permeability. When 13 properties pattern, an additional ten properties were incorporated, including inhibitory activity against eight off-target proteins from tyrosine kinases and two other properties. Eight tyrosine kinase proteins were selected as described by Yoshizawa et al.¹⁰: receptor protein-tyrosine kinase erbB-2, Abelson tyrosine-protein kinase, proto-oncogene tyrosine-protein kinase, lymphocyte-specific tyrosine-protein kinase, platelet-derived growth factor receptor beta, vascular endothelial growth factor receptor 2, fibroblast growth factor receptor 1, and ephrin type-B receptor 4. For the other two properties, inhibitory activity against the human ether-a-go-go-related gene channel (hERG) and solubility were selected. The training data for these models were obtained from ChEMBL28⁷² for inhibitory activity against tyrosine kinases, metabolic stability, and membrane permeability; from Sato et al.⁷³ for hERG inhibition; and from Tayyebi et al.⁷⁴ for solubility. To prepare the input features for the prediction models, Morgan fingerprints with radii and dimensions of 2 and 2048, respectively, were calculated using RDKit software⁷⁵. As a prediction algorithm, light gradient boosting machine⁷⁶ was applied, and the hyperparameters were optimized using Optuna software⁷⁷. The prediction performance of each model was evaluated using 5-fold cross-validation (Fig. S13). Finally, for each property, prediction models were developed using the entire dataset for the molecular design.

Bayesian optimization-assisted adjustment of reliability levels

The goal of DyRAMO is to optimize the DSS score to establish an appropriate reliability level for defining AD while iteratively designing molecules. The search spaces, consisting of combinations of reliability level value sets for each target property prediction, are defined as follows: The units of the reliability level candidates are adjusted based on the number of target properties. For the 3 properties case, we generated 80 options per property, spanning from 0.1 to 0.9 by 0.01 increments, yielding 80³ threshold combinations for exploration. For the 13 properties case, we prepared three reliability level candidates per property: 0.3, 0.4, and 0.5, resulting in 3¹³ possible combinations. DyRAMO uses BO to efficiently explore the appropriate combinations of reliability levels that result in higher DSS values. For the acquisition function in the BO, the expected improvement⁷⁸ was used in all computational scenarios. The computation of BO, a Python library, PHYSBO⁷⁹, was used. After 10 random explorations were conducted for initialization, 30 explorations were conducted using BO. This total of 40 explorations is called an episode. Five episodes were run for each set of computational scenarios to account for the randomness in the exploration processes.

Data availability

Property prediction models and their training data are available on GitHub at https://github.com/ycu-iil/prediction_model_collection.Source data are provided with this paper.

Code availability

The Python code used to implement DyRAMO is available on GitHub at https://github.com/ycu-iil/DyRAMO and Zenodo at https://doi.org/10.5281/zenodo.14731974. DyRAMO is available under the MIT License. Methods for reproducing the results are described in the following: https://github.com/ycu-iil/DyRAMO/blob/main/doc/reproduction_instruction.md.

References

Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine learning: Generative models for matter engineering. Science 361, 360–365 (2018).
Article ADS PubMed MATH CAS Google Scholar
Bilodeau, C., Jin, W., Jaakkola, T., Barzilay, R. & Jensen, K. F. Generative models for molecular discovery: Recent advances and challenges. WIREs Comput. Mol. Sci. 12, e1608 (2022).
Pang, C., Qiao, J., Zeng, X., Zou, Q. & Wei, L. Deep generative models in de novo drug molecule generation. J. Chem. Inf. Modeling 64, 2174–2194 (2023).
Article Google Scholar
Li, Y., Zhang, L. & Liu, Z. Multi-objective de novo drug design with conditional graph generative model. J. Cheminform. 10, 33 (2018).
Winter, R. et al. Efficient multi-objective molecular optimization in a continuous latent space. Chem. Sci. 10, 8016–8024 (2019).
Article PubMed PubMed Central MATH CAS Google Scholar
Khemchandani, Y. et al. Deepgraphmolgen, a multi-objective, computational strategy for generating molecules with desirable properties: a graph convolution and reinforcement learning approach. J. Cheminform. 12, 53 (2020).
Liu, X. et al. Drugex v2: de novo design of drug molecules by pareto-based multi-objective reinforcement learning in polypharmacology. J. Cheminform. 13, 85 (2021).
Ma, B. et al. Structure-based de novo molecular generator combined with artificial intelligence and docking simulations. J. Chem. Inf. Modeling 61, 3304–3313 (2021).
Article MATH CAS Google Scholar
Perron, Q. et al. Deep generative models for ligand-based de novo design applied to multi parametric optimization. J. Computational Chem. 43, 692–703 (2022).
Article MATH CAS Google Scholar
Yoshizawa, T. et al. Selective inhibitor design for kinase homologs using multiobjective monte carlo tree search. J. Chem. Inf. Modeling 62, 5351–5360 (2022).
Article MATH CAS Google Scholar
Sumita, M., Yang, X., Ishihara, S., Tamura, R. & Tsuda, K. Hunting for organic molecules with artificial intelligence: Molecules optimized for desired excitation energies. ACS Cent. Sci. 4, 1126–1133 (2018).
Article PubMed PubMed Central CAS Google Scholar
Radhakrishnapany, K. T. et al. Design of fragrant molecules through the incorporation of rough sets into computer-aided molecular design. Mol. Syst. Des. Eng. 5, 1391–1416 (2020).
Article MATH CAS Google Scholar
Zhang, Y. et al. Discovery of polymer electret material via de novo molecule generation and functional group enrichment analysis. Appl. Phys. Lett. 118, 223904 (2021).
Article ADS CAS Google Scholar
Chong, J. W., Thangalazhy-Gopakumar, S., Muthoosamy, K. & Chemmangattuvalappil, N. G. Design of bio-oil additives via molecular signature descriptors using a multi-stage computer-aided molecular design framework. Front. Chem. Sci. Eng. 16, 168–182 (2021).
Article Google Scholar
Ooi, Y. J. et al. Design of fragrance molecules using computer-aided molecular design with machine learning. Computers Chem. Eng. 157, 107585 (2022).
Article MATH CAS Google Scholar
S. V., S. S. et al. Multi-objective goal-directed optimization of de novo stable organic radicals for aqueous redox flow batteries. Nat. Mach. Intellig. 4, 720–730 (2022).
Yee, Q. Y., Hassim, M. H., Chemmangattuvalappil, N. G., Ten, J. Y. & Raslan, R. Optimization of quality, safety and health aspects in personal care product preservative design. Process Saf. Environ. Prot. 157, 246–253 (2022).
Article CAS Google Scholar
Sumita, M. et al. De novo creation of a naked eye–detectable fluorescent molecule based on quantum chemical computation and machine learning. Sci. Adv. 8 eabj3906 (2022).
Fujita, T. et al. Understanding the evolution of a de novo molecule generator via characteristic functional group monitoring. Sci. Technol. Adv. Mater. 23, 352–360 (2022).
Article PubMed PubMed Central MATH CAS Google Scholar
Fromer, J. C. & Coley, C. W. Computer-aided multi-objective optimization in small molecule discovery. Patterns 4, 100678 (2023).
Article PubMed PubMed Central MATH CAS Google Scholar
Gendreau, P. et al. Molecular assays simulator to unravel predictors hacking in goal-directed molecular generations. J. Chem. Inf. Modeling 63, 3983–3998 (2023).
Article MATH CAS Google Scholar
Hadfield-Menell, D., Milli, S., Abbeel, P., Russell, S. & Dragan, A. Inverse reward design https://arxiv.org/abs/1711.02827 (2017).
Skalse, J., Howe, N. H. R., Krasheninnikov, D. & Krueger, D. Defining and characterizing reward hacking https://arxiv.org/abs/2209.13085 (2022).
Renz, P., Van Rompaey, D., Wegner, J. K., Hochreiter, S. & Klambauer, G. On failure modes in molecule generation and optimization. Drug Discov. Today.: Technol. 32-33, 55–63 (2019).
Article PubMed Google Scholar
Langevin, M., Vuilleumier, R. & Bianciotto, M. Explaining and avoiding failure modes in goal-directed generation of small molecules. J. Cheminform. 14, 40 (2022).
Gao, W. & Coley, C. W. The synthesizability of molecules proposed by generative models. J. Chem. Inf. Modeling 60, 5714–5723 (2020).
Article MATH CAS Google Scholar
Netzeva, T. I. et al. Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships: The report and recommendations of ecvam workshop 52,. Alternatives Lab. Anim. 33, 155–173 (2005).
Article CAS Google Scholar
Dimitrov, S. et al. A stepwise approach for defining the applicability domain of sar and qsar models. J. Chem. Inf. Modeling 45, 839–849 (2005).
Article MATH CAS Google Scholar
Schroeter, T. S. et al. Estimating the domain of applicability for machine learning qsar models: a study on aqueous solubility of drug discovery molecules. J. Computer-Aided Mol. Des. 21, 651–664 (2007).
Article ADS MATH CAS Google Scholar
Weaver, S. & Gleeson, M. P. The importance of the domain of applicability in qsar modeling. J. Mol. Graph. Model. 26, 1315–1326 (2008).
Article PubMed MATH CAS Google Scholar
Dragos, H., Gilles, M. & Alexandre, V. Predicting the predictability: A unified approach to the applicability domain problem of qsar models. J. Chem. Inf. Modeling 49, 1762–1776 (2009).
Article MATH CAS Google Scholar
Kühne, R., Ebert, R.-U. & Schüürmann, G. Chemical domain of qsar models from atom-centered fragments. J. Chem. Inf. Modeling 49, 2660–2669 (2009).
Article MATH Google Scholar
Hanser, T., Barber, C., Marchaland, J. F. & Werner, S. Applicability domain: towards a more formal definition. SAR QSAR Environ. Res. 27, 865–881 (2016).
Article CAS Google Scholar
Gal, Y. & Ghahramani, Z. Dropout as a bayesian approximation: Representing model uncertainty in deep learning https://arxiv.org/abs/1506.02142 (2015).
Lakshminarayanan, B., Pritzel, A. & Blundell, C. Simple and scalable predictive uncertainty estimation using deep ensembles https://arxiv.org/abs/1612.01474 (2016).
Sensoy, M., Kaplan, L. & Kandemir, M. Evidential deep learning to quantify classification uncertainty https://arxiv.org/abs/1806.01768 (2018).
Janet, J. P., Duan, C., Yang, T., Nandy, A. & Kulik, H. J. A quantitative uncertainty metric controls error in neural network-driven chemical discovery. Chem. Sci. 10, 7913–7922 (2019).
Article PubMed PubMed Central CAS Google Scholar
Tran, K. et al. Methods for comparing uncertainty quantifications for material property predictions. Mach. Learn.: Sci. Technol. 1, 025006 (2020).
MATH Google Scholar
Soleimany, A. P. et al. Evidential deep learning for guided molecular property prediction and discovery. ACS Cent. Sci. 7, 1356–1367 (2021).
Article PubMed PubMed Central MATH CAS Google Scholar
Wang, D. et al. A hybrid framework for improving uncertainty quantification in deep learning-based qsar regression modeling. J. Cheminform. 13, 69 (2021).
Kaneko, H. & Funatsu, K. Applicability domains and consistent structure generation. Mol. Inform. 36, 1600032 (2016).
Langevin, M. et al. Impact of applicability domains to generative artificial intelligence. ACS Omega 8, 23148–23167 (2023).
Article PubMed PubMed Central MATH CAS Google Scholar
Snoek, J., Larochelle, H. & Adams, R. P. Practical bayesian optimization of machine learning algorithms. Adv. Neural Inform. Process. Syst. 25 2951–2959 (2012).
Tetko, I. V. et al. Critical assessment of qsar models of environmental toxicity against tetrahymena pyriformis: Focusing on applicability domain and overfitting by variable selection. J. Chem. Inf. Modeling 48, 1733–1746 (2008).
Article MATH CAS Google Scholar
Kar, S., Roy, K. & Leszczynski, J.Applicability Domain: A Step Toward Confident Predictions and Decidability for QSAR Modeling, 141-169 (Springer New York, 2018).
Ishida, S. et al. Chemtsv2: Functional molecular design using de novo molecule generator. WIREs Comput. Mol. Sci. 13, e1680 (2023).
Shimizu, Y. et al. Ai-driven molecular generation of not-patented pharmaceutical compounds using world open patent data. J. Cheminform. 15, 120 (2023).
Rumelhart, D. E. & McClelland, J. L. Learning Internal Representations by Error Propagation. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations, 318–362 (MIT Press, 1987).
Coulom, R. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. 5th International Conference on Computer and Games, 72–83 (Springer, 2006).
Morgan, H. L. The generation of a unique machine description for chemical structures-a technique developed at chemical abstracts service. J. Chem. Documentation 5, 107–113 (1965).
Article MATH CAS Google Scholar
Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Modeling 50, 742–754 (2010).
Article MATH CAS Google Scholar
Brown, N., Fiscato, M., Segler, M. H. & Vaucher, A. C. GuacaMol: Benchmarking models for de novo molecular design. J. Chem. Inf. Modeling 59, 1096–1108 (2019).
Article CAS Google Scholar
Cummins, D. J. & Bell, M. A. Integrating everything: The molecule selection toolkit, a system for compound prioritization in drug discovery. J. Medicinal Chem. 59, 6999–7010 (2016).
Article MATH CAS Google Scholar
Blaschke, T. et al. Reinvent 2.0: An ai tool for de novo drug design. J. Chem. Inf. Modeling 60, 5918–5922 (2020).
Article MATH CAS Google Scholar
Herbst, R. S. Review of epidermal growth factor receptor biology. Int. J. Radiat. Oncol. Biol. Phys. 59, S21–S26 (2004).
Article MATH Google Scholar
Lambrinidis, G. & Tsantili-Kakoulidou, A. Multi-objective optimization methods in novel drug design. Expert Opin. Drug Discov. 16, 647–658 (2020).
Article PubMed Google Scholar
Bhatia, P. et al. Novel quinazoline-based egfr kinase inhibitors: A review focussing on sar and molecular docking studies (2015-2019). Eur. J. Medicinal Chem. 204, 112640 (2020).
Article MATH CAS Google Scholar
Sahigara, F. et al. Comparison of different approaches to define the applicability domain of qsar models. Molecules 17, 4791–4810 (2012).
Article PubMed PubMed Central MATH CAS Google Scholar
Rakhimbekova, A. et al. Comprehensive analysis of applicability domains of qspr models for chemical reactions. Int. J. Mol. Sci. 21, 5542 (2020).
Article PubMed PubMed Central MATH CAS Google Scholar
Maggiora, G. M. On outliers and activity cliffswhy qsar often disappoints. J. Chem. Inf. Modeling 46, 1535–1535 (2006).
Article MATH CAS Google Scholar
Heikamp, K., Hu, X., Yan, A. & Bajorath, J. Prediction of activity cliffs using support vector machines. J. Chem. Inf. Modeling 52, 2354–2365 (2012).
Article MATH CAS Google Scholar
de la Vega de León, A. & Bajorath, J. Prediction of compound potency changes in matched molecular pairs using support vector regression. J. Chem. Inf. Modeling 54, 2654–2663 (2014).
Article MATH Google Scholar
Husby, J., Bottegoni, G., Kufareva, I., Abagyan, R. & Cavalli, A. Structure-based predictions of activity cliffs. J. Chem. Inf. Modeling 55, 1062–1076 (2015).
Article CAS Google Scholar
Horvath, D. et al. Prediction of activity cliffs using condensed graphs of reaction representations, descriptor recombination, support vector machine classification, and support vector regression. J. Chem. Inf. Modeling 56, 1631–1640 (2016).
Article MATH CAS Google Scholar
Iqbal, J., Vogt, M. & Bajorath, J. Prediction of activity cliffs on the basis of images using convolutional neural networks. J. Computer-Aided Mol. Des. 35, 1157–1164 (2021).
Article ADS MATH CAS Google Scholar
Park, J., Sung, G., Lee, S., Kang, S. & Park, C. Acgcn: Graph convolutional networks for activity cliff prediction between matched molecular pairs. J. Chem. Inf. Modeling 62, 2341–2351 (2022).
Article MATH CAS Google Scholar
van Tilborg, D., Alenicheva, A. & Grisoni, F. Exposing the limitations of molecular machine learning with activity cliffs. J. Chem. Inf. Modeling 62, 5938–5951 (2022).
Article MATH Google Scholar
Tamura, R., Terayama, K., Sumita, M. & Tsuda, K. Ranking pareto optimal solutions based on projection free energy. Phys. Rev. Mater. 7, 093804 (2023).
Article MATH CAS Google Scholar
Suzuki, T., Ma, D., Yasuo, N. & Sekijima, M. Mothra: Multiobjective de novo molecular generation using monte carlo tree search. J. Chem. Inf. Modeling 64, 7291–7302 (2024).
Article CAS Google Scholar
Weininger, D. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. J. Chem. Inf. Computer Sci. 28, 31–36 (1988).
Article MATH CAS Google Scholar
Arús-Pous, J. et al. Randomized smiles strings improve the quality of molecular generative models. J. Cheminform. 11, 71 (2019).
Mendez, D. et al. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 47, D930–D940 (2018).
Article PubMed Central MATH Google Scholar
Sato, T., Yuki, H. & Honma, T. Quantitative prediction of herg inhibitory activities using support vector regression and the integrated herg dataset in amed cardiotoxicity database. Chem.-Bio Inform. J. 21, 70–80 (2021).
Article MATH CAS Google Scholar
Tayyebi, A. et al. Prediction of organic compound aqueous solubility using machine learning: a comparison study of descriptor-based and fingerprints-based models. J. Cheminform. 15, 99 (2023).
Landrum, G. Rdkit: Open-source cheminformatics. https://www.rdkit.org.
Ke, G. et al. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inform. Process. Syst. 30, 3146–3154 (2017).
Akiba, T., Sano, S., Yanase, T., Ohta, T. & Koyama, M. Optuna: A next-generation hyperparameter optimization framework https://arxiv.org/abs/1907.10902 (2019).
Jones, D. R., Schonlau, M. & Welch, W. J. J. Glob. Optim. 13, 455–492 (1998).
Article Google Scholar
Motoyama, Y. et al. Bayesian optimization package: Physbo. Computer Phys. Commun. 278, 108405 (2022).
Article MathSciNet MATH CAS Google Scholar
Lipinski, C. A., Lombardo, F., Dominy, B. W. & Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 23, 3–25 (1997).
Article CAS Google Scholar

Download references

Acknowledgements

This research was conducted in “Development of a Next-generation Drug Discovery AI through Industry-Academia Collaboration (DAIIA)” under grant no.JP23nk0101111 (T.H. and K.T.) and a Research Support Project for Life Science and Drug Discovery under grant no.JP22ama121023 (K.T.) from Japan Agency for Medical Research and Development (AMED). This work was also supported by the Ministry of Education, Culture, Sports, Science and Technology (MEXT) under the grants: Simulation- and AI-driven next-generation medicine and drug discovery based on “Fugaku” (Grant Number: JPMXP1020230120, K.T.), feasibility studies for the next-generation computing infrastructure, Data Creation and Utilization Type Material Research and Development Project (Grant Number: JPMXP1122683430, K.T.), JST FOREST Program (Grant Number: JPMJFR232U, K.T.), RIKEN Junior Research Associate Program (T.Y.), and Chugai Foundation for Innovative Drug Discovery Science (C-FINDs) (T.Y.).

Author information

Authors and Affiliations

Graduate School of Medical Life Science, Yokohama City University, 1-7-29, Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Kanagawa, Japan
Tatsuya Yoshizawa, Shoichi Ishida & Kei Terayama
RIKEN Center for Biosystems Dynamics Research, 1-7-22, Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Kanagawa, Japan
Tatsuya Yoshizawa, Tomohiro Sato & Teruki Honma
HPC- and AI-driven Drug Development Platform Division, RIKEN Center for Computational Science, 1-7-22, Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Kanagawa, Japan
Masateru Ohta
RIKEN Center for Advanced Intelligence Project, 1-4-1, Nihonbashi, Chuo-ku, 103-0027, Tokyo, Japan
Kei Terayama
MDX Research Center for Element Strategy, Institute of Science Tokyo, 4259, Nagatsuta-cho, Midori-ku, Yokohama, 226-8501, Kanagawa, Japan
Kei Terayama

Authors

Tatsuya Yoshizawa
View author publications
Search author on:PubMed Google Scholar
Shoichi Ishida
View author publications
Search author on:PubMed Google Scholar
Tomohiro Sato
View author publications
Search author on:PubMed Google Scholar
Masateru Ohta
View author publications
Search author on:PubMed Google Scholar
Teruki Honma
View author publications
Search author on:PubMed Google Scholar
Kei Terayama
View author publications
Search author on:PubMed Google Scholar

Contributions

Tatsuya Yoshizawa: conceptualization (lead); methodology (lead); software (lead); writing—original draft (lead); writing—review and editing (lead). Shoichi Ishida: conceptualization (support); methodology (lead); software (support); writing—review and editing (lead). Tomohiro Sato: methodology (support); software (support), writing—review and editing (support). Masateru Ohta: methodology (support); software (support), writing—review and editing (support). Teruki Honma: supervision (support); funding acquisition (lead); project administration (lead); methodology (support); writing—review and editing (support). Kei Terayama: supervision (lead); funding acquisition (lead); methodology (lead); project administration (lead); conceptualization (lead); software (support); writing—review and editing (lead).

Corresponding author

Correspondence to Kei Terayama.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Transparent Peer Review file

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Yoshizawa, T., Ishida, S., Sato, T. et al. A data-driven generative strategy to avoid reward hacking in multi-objective molecular design. Nat Commun 16, 2409 (2025). https://doi.org/10.1038/s41467-025-57582-3

Download citation

Received: 24 July 2024
Accepted: 24 February 2025
Published: 11 March 2025
DOI: https://doi.org/10.1038/s41467-025-57582-3