Introduction

Innovative drug research, propelled by advances in chemistry, aims to identify effective molecular agents for disease treatment1. While small molecules—especially those following Lipinski’s rule—remain central to drug development, their limited binding interfaces and insufficient affinity for desired targets often result in poor stability2, short half-life, and off-target effects3,4. To address these limitations, researchers are exploring distinct modalities to expand the chemical space of druggability, with macrocycles emerging as promising candidates for overcoming the constraints of traditional small-molecule drugs1,5. Macrocycles are typically defined as cyclic chemical molecules or peptides containing a dodecyl ring or larger ring structure, bridging the gap between small molecules and antibodies1,6,7. These chemical structures can form large contact interfaces with proteins8 and with higher binding affinity and improved selectivity. Meanwhile, their chameleon properties7 enhance drug stability and confer favorable pharmacokinetic profiles9,10,11. Nowadays, macrocycles have been successfully used as potential therapeutic agents for a variety of drug targets, such as kinases, proteases, and G-protein-coupled receptors, achieving excellent outcomes in the treatment of many diseases.

Rational macrocyclic drug design typically involves the macrocyclization of bioactive linear compounds and the subsequent modification of the resulting macrocycles12. In the macrocyclization step, there are already some computational methods13,14 that leverage geometrically constrained linker searching and connection strategy from a prebuilt 3D linker database to derive distinct macrocyclic skeletons. In addition, our team previously introduced Macformer, a substructure-aligned SMILES augmentation strategy with a Transformer architecture to automatically generate macrocyclic linkers15. However, given a bioactive macrocyclic compound as a starting point, how to further modify and optimize its structure (e.g., through macrocyclic scaffold hopping or substituent modifications on the macrocyclic ring) to enhance druggability or expand the pool of candidates for early-stage clinical screening remains an unresolved challenge. So far, such modifications primarily depend on the expert experience of pharmaceutical chemists or the utilization of iterative methods such as pharmacophore replacement, which are time-consuming and labor-intensive. A representative case is the discovery of macrocyclic compound Z11, a selective CDK9 inhibitor, designed to overcome resistance to Osimertinib in non-small cell lung cancer. Beginning with the linear precursor Z0, the researchers first performed macrocyclization to obtain Z1, followed by three iterative rounds of structural modification, ultimately yielding the optimized macrocyclic compound Z1116. Despite such progress, effective computational strategies for the structural optimization of macrocycles remain scarce in the literature.

Thoroughly dissecting the intricate chemical space of macrocycles is a fundamental guidance for their structural modification and optimization. As an emerging and highly underexplored class of structures, human knowledge about macrocycles is limited, and researchers have been struggling to elucidate what properties make a macrocyclic compound suitable for drug development. Viarengo-Baker et al.17 conducted a principal component analysis of oral and non-oral macrocyclic compounds and identified 13 properties that could be used to design macrocyclic compounds in the same chemical space as oral macrocyclic drugs. Jimenez et al.1 systematically analyzed the FDA-approved macrocyclic drugs and proposed a simple bi-descriptor model, i.e., hydrogen bond donors less than or equal to 7 in combination with either molecular weight less than 1000 or cLogP greater than 2.5, as filtering criteria for screening oral macrocyclic drugs. However, due to the structurally complex nature of macrocycles and the scarcity of available macrocyclic therapeutics, explicit descriptors offer only limited guidance for the design of macrocyclic drugs.

In this paper, we consider exploring the compounds situated in the immediate chemical neighborhood of lead macrocycles, which could have great potential to serve as drug candidates. We achieve this by introducing CycleGPT‌, a molecular generative model tailored for systematic exploration and expansion of privileged macrocycle chemical space. CycleGPT is characterized by a progressive transfer learning paradigm to incrementally transfer knowledge from pre-trained chemical language models to specialized macrocycle generation, overcoming critical data scarcity in the target domain. In particular, it can effectively sample macrocycles from the neighboring chemical space of privileged macrocyclic candidates, converting the problem of macrocycle structural modification into the exploration of the chemical space of macrocycles. In addition, we designed a generative sampling scheme named HyperTemp to facilitate CycleGPT to dynamically balance the exploitation of high-probability tokens with an exploration of alternative pathways, thus achieving a superior equilibrium between the novelty and validity of macrocycles. Using CycleGPT, we have designed a number of macrocycles specifically for the JAK2 target based on the macrocycles designed by Macformer from our previous research15. In this CycleGPT-driven prospective drug design, three potent macrocyclic JAK2 inhibitors were identified, with IC50 values reaching 1.65 nM, 1.17 nM, and 5.41 nM, respectively. One of them exhibits an even better kinase selectivity profile compared with the marketed drugs, Fedratinib and Pacritinib. Furthermore, the discovered macrocycle can inhibit RhePO-mediated polycytiosis and splenomegaly in BALB/c mice at a lower dose than Fedratinib and Pacritinib. These therapeutic candidates demonstrate the significant potential of CycleGPT for advancing macrocyclic drug discovery.

Results

Model overview

GPT-based models have demonstrated exceptional performance in sequence processing tasks in recent years18. To explore the macrocyclic chemical space, we developed a model called CycleGPT based on the progressive transfer learning strategy. The model interprets and generates chemical language representations at a concise and efficient character level, which can effectively sample the adjacent chemical space of privileged macrocycles and explore potential alternative macrocyclic drug candidates. To address the challenge of insufficient macrocyclic compounds, CycleGPT was first pre-trained using 365,063 active compounds with biological activity labels to grasp SMILES semantics. These compounds were extracted from the ChEMBL19 database with IC50/EC50/Kd/Ki values lower than 1 μM and SMILES strings shorter than 140 tokens in length. Subsequently, we collected 19,920 macrocyclic molecules with SMILES lengths of less than 140 characters from the CHEMBL and Drugbank20 databases. These macrocycles were utilized for transfer learning on the pre-trained CycleGPT model, aiming to adapt the model’s knowledge from the chemical space of bioactive linear molecules to that of macrocyclic compounds. The model can be further fine-tuned with macrocyclic hits for designing target-specific drug candidates. (Fig. 1a, b). We use the Lion21 optimizer to adjust the network parameters.

Fig. 1: Overview of CycleGPT.
figure 1

a Architecture of CycleGPT. b Training process of CycleGPT. Using a progressive transfer learning strategy, bioactive molecules are used first for pretraining GPT, then the macrocyclic compounds and downstream application targeting JAK2 are used to finetune the model in the second and third steps, respectively. c, d UMAP visualization of molecules to illustrate the transfer of chemical space. Altogether 1000 generated molecules were stochastically selected after each of the three stages of the transfer learning process. This shows that the model correctly migrates from the active linear compound space to the macrocyclic space and further migrates to the JAK2 macrocyclic space.

In molecular generation, the sampling algorithm and network architecture jointly determine the quality of the generated SMILES. However, the performance of existing sampling algorithms in macrocyclic SMILES remains challenging, especially in achieving a satisfactory balance between the validity and novelty of generated compounds. In this paper, we propose a heuristic sampling algorithm, HyperTemp, which makes a transformation strategy based on tempered sampling to facilitate fine-grained adjustments of the token probabilities. HyperTemp can appropriately reduce the probability of optimal tokens while increasing the probability of suboptimal tokens to improve novelty and maintain the validity of the generated macrocycles. We combine HyperTemp sampling with the CycleGPT architecture to realize a complete macrocyclic generation model, and systematically compare HyperTemp with existing sampling algorithms. To verify the validity and effectiveness of this introduced transformation, we also selected three other transformations to adjust the probability differently and performed a comparison with HyperTemp (Supplementary Formula 13). To the best of our knowledge, this work is among the first to comprehensively explore sampling algorithms in molecular generation for macrocycles, an underexplored yet crucial subclass.

Model performance

The performance of CycleGPT-HyperTemp was evaluated and compared with other molecular generation methods22,23,24,25,26,27,28,29,30, with results shown in Table 1. In addition to validity and macrocycle_ratio, greater emphasis is placed on novel_unique_macrocycles, a comprehensive metric quantifying the proportion of generated valid and unique macrocycles that are absent from the training dataset. Char_RNN can generate enough valid macrocycles but with a very low novel_unique_macrocycles value (11.76%), while the GPT-based models MolGPT and cMolGPT failed to capture the semantics of macrocycles. Llamol and MTMol-GPT demonstrate advantages over other models in terms of the novel_unique_macrocycles metric (38.13% and 31.09%, respectively), yet there remains a notable margin compared to our CycleGPT-HyperTemp model (55.80%).

Table 1 Comparison of CycleGPT-HyperTemp and other models

To have a more comprehensive understanding of the characteristics and performance of the HyperTemp sampling, we took CycleGPT as the base model and replaced HyperTemp with 13 different sampling strategies. As shown in Supplementary Table 1, when employing the MaxEntropy or Noised Top-K algorithm, CycleGPT undergoes a cliff-like decline across all metrics. Overall, with respect to the novel_unique_macrocycles metric, the HyperTemp sampling method performs best among all algorithms. We further analyzed the effect of the HyperTemp sampling algorithm on the generated tokens. As illustrated in Fig. 2a–c, through finer probability adjustment based on tempered sampling, HyperTemp further reduces the preference for optimal tokens and enhances the exploration of suboptimal tokens, which promotes the diversity of token sampling and improves the novelty.

Fig. 2: Behavior of HyperTemp sampling and chemical space exploration of Loratinib using CycleGPT-HyperTemp.
figure 2

Distribution of the percentage change in generative probabilities for the suboptimal tokens (a), and for the optimal tokens (b) when comparing HyperTemp sampling against Tempered sampling (as baseline). The optimal (suboptimal) token is defined as the token with the largest (second-largest) generative probability in each generation for the two sampling schemes. Specifically, percentage changes are calculated by the token probability of HyperTemp sampling minus that of Tempered sampling, divided by the latter. c Distribution of the rank of generated tokens from Tempered sampling and HyperTemp sampling for a total of 30,000 SMILES generated. We can see that for the optimal token, the frequency of being sampled decreases; for suboptimal tokens, the frequency of being sampled is increased. This indicates that HyperTemp sampling effectively expands the exploration of suboptimal tokens compared with Tempered sampling, which is key to improving the novelty. d UMAP visualization of the structural modification of Loratinib as an example. A total of 1000 generated molecules were stochastically selected after the transfer learning process of Loratinib. Chemical structures of representative compounds illustrating structural modifications are presented. These modifications include the scaffold hopping of macrocycles (displayed using blue cycle) and peripheral substituent modifications (displayed using red cycle).

In addition, we conducted a systematic evaluation of MOSES9 metrics for macrocycles generated by different methods, excluding those with poor performance in novel_unique_macrocycles (<20%). Considering the numerous comparison methods and their subtle variations, we summarized only the top three methods for each property and displayed them in Supplementary Tables 23. Across the 10 properties, CycleGPT combined with either HyperTemp or Top-p sampling ranks in the top three for six, outperforming all other methods. Moreover, the molecular properties analyses (Supplementary Figs. 13) reveal that the generated macrocycles from CycleGPT-HyperTemp possess a similar distribution compared with the training dataset. The above results demonstrate that our CycleGPT-HyperTemp model could generate a higher proportion of valid and unique macrocycles while maintaining molecule diversity and property quality.

To evaluate the effectiveness of our method in downstream target applications, as an example, we further expanded the chemical space of the macrocyclic compound Loratinib using our CycleGPT-HyperTemp method. The visualization of the generated molecular chemical space is shown in Fig. 2d. It can be seen that after performing fine-tuning with Loratinib, the generated macrocycles migrated to the nearby chemical space of Loratinib, demonstrating the correct chemical space exploration ability. Furthermore, it illustrates that our method can achieve structural modification of macrocycles, including 1. macrocyclic scaffold hopping, and 2. peripheral substituent modifications. These structural modification functions are consistent with the common modification methods in medicinal chemistry, reflecting the practicality of structural modifications implemented by our method.

Modification of macrocyclic JAK2 inhibitors using CycleGPT

The Janus kinases (JAKs), a family of intracellular tyrosine kinases, play a pivotal role in mediating the signaling of many cytokines and are implicated in the pathogenesis of various diseases, such as myeloproliferative neoplasms and rheumatoid arthritis31,32,33. We previously designed three macrocyclic Janus kinase 2 (JAK2) inhibitors using Macformer through macrocyclization of Fedratinib, with compound M3 (renamed for clarity) demonstrating potential as a drug candidate5. Having alternative drug candidates is an effective strategy to provide contingency options to mitigate risks and ensure continuity in the drug development process. In this study, CycleGPT was utilized to explore their neighboring chemical space to obtain potential alternative drug candidates targeting JAK2 for further investigation.

Ten-fold augmentation was performed on the three macrocyclic JAK2 inhibitors with randomized SMILES strings, which were subsequently used to fine-tune the CycleGPT model for transfer toward a specific chemical space. The HyperTemp algorithm was employed in the inference process, generating 5058 macrocycles distributed around the starting macrocycles, indicating that CycleGPT-HyperTemp learned and explored the space of the JAK2 macrocycles correctly (Fig. 1d). To further enrich the macrocycles generated by CycleGPT, we employed the CyclePred, a heterogeneous graph transformer model, for the prediction of JAK2 inhibitory activity (Fig. 3a). The RMSE value of CyclePred from 5 times 5-fold is 0.6717 and can be lowered to 0.5776 when predicting the JAK2 macrocycles in the non-training set (lower is better). The predicted and experimental -pIC50 values exhibit a good correlation, with an R2 value of 0.7036 (Fig. 3c).

Fig. 3: Workflow and performance of CyclePred when predicting the activity of JAK2 inhibitors and in vitro activities of picked 6 generated macrocycles.
figure 3

a The Workflow of CyclePred. b The pie chart of activity data of collected JAK2 inhibitors. c The R2 of CyclePred of macrocycles dataset (macrocycles in test and validate dataset). d Scatter plot of docking scores for the top 70 predicted macrocycles, including 6 synthesized macrocycles and other candidates, obtained using Maestro Prime and Rosetta docking. Six molecules that failed to dock were excluded. e The chemical structures of synthesis 6 macrocycles. f The in vitro activities of compounds 1–6. Data are mean ± SD, each parallel experiment was repeated three times.

After predicting JAK2 activity with CyclePred, the top 70 macrocycles were selected for docking simulation using Glide34 and Rosetta35, respectively. The Fedratinib-JAK2 kinase domain complex (PDB code 6VNE)36 from the PDB database was used for molecular docking of the above 70 macrocyclic compounds. According to the docking results (Fig. 3d) and the synthetic experience of pharmaceutical chemists, six macrocycles (Fig. 3e) were finally selected for chemical synthesis (Supplementary Figs. 519) and biological activity evaluation. The molecular binding modes of these six macrocyclic compounds with the target are shown in Supplementary Fig. 4 (visualization using ChimeraX)37. It can be seen that they all have a good interaction with the target.

Activities of designed macrocyclic JAK2 inhibitors

Kinase assays were performed using the Z’-LYTETM kinase assay kit to determine the inhibitory activities of these six macrocyclic compounds, with compound M3, which we gained from the previous study, and Fedratinib as positive controls (Fig. 3f and Supplementary Fig. 21). Three of the six macrocycles demonstrated strong JAK2 inhibition with single nanomolar IC50 values, and the most potent compound 2 (IC50 = 1.17 nM) showed superior activity compared to Fedratinib. These compounds were further evaluated for their antiproliferative activities against HEL and SET-2 cells (Fig. 3f), which are human erythroleukemia cells and primary thrombocytosis cells containing the JAK2 V617F mutation. Compound 2 exhibited significant antiproliferative activities against HEL and SET-2 cells with JAK2V617F mutant by inhibiting the JAK2-STAT signaling pathway in a dose-dependent manner (Fig. 4a and Supplementary Fig. 22). Furthermore, compound 2 displayed a superior kinome selectivity profile than Fedratinib and Pacritinib (Fig. 4b). At a concentration of 100 nM, the number of wild-type kinases inhibited by compound 2 is 17 (percent control <35%), while those are 55 and 34 for Pacritinib and Fedratinib, respectively. Kinase selectivity profiles of Fedratinib can be obtained from our previous article5.

Fig. 4: Compound 2 potently inhibited JAK2-mediated signaling and inhibited rhEPO-mediated polycythemia and splenomegaly in BALB/c mice.
figure 4

a Western blot analysis of phosphorylation of JAK2 (Y1007/8), STAT3, STAT5, AKT and ERK. b Kinase selectivity profiles of Pacritinib and compound 2 against 468 kinases. c Representative photographs of the spleen. d Spleen weight. e Hematocrit. f Reticulocyte count. g Ter119/CD71 erythroblast population in spleen. h Ter119/CD71 erythroblast population in bone marrow. i Body weight.; All graphs show mean ± SEM. ####P < 0.0001 versus control. *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001 versus model. ^P < 0.05; ^^P < 0.01 versus 100 mg/kg of compound 2 groups.

Building upon the superior inhibition and the kinase profiling results of compound 2, subsequent in vivo efficacy studies were conducted to further validate its biological effects. Studies have shown that treating rodents with recombinant erythropoietin (rhEPO) can simulate the main symptoms of human erythrocytosis (PV)38,39, including increased reticulocyte count and hematocrit, splenomegaly caused by extramedullary hematopoiesis, and so on. We treated BALB/c mice with daily s.c. injections of either 10 units of rhEPO along with once daily oral administration of compound 2 dosed at 25, 50, and 75 mg/kg or vehicle for 4 consecutive days. Compared to the control group, the rhEPO-induced model group showed elevation of reticulocyte counts and hematocrit, expansion of Ter119/CD71 erythroblast in spleen and bone marrow, and splenomegaly. Excitedly, compound 2 suppressed rhEPO-induced hematocrit expansion, reticulocytosis, and splenomegaly (Fig. 4c–f), with the effects of 100 mg/kg compound 2 being comparable to those of 120 mg/kg Fedratinib and superior to those of 120 mg/kg Pacritinib. Moreover, compound 2 inhibited the expansion of Ter119/CD71 erythroblasts in both the spleen and bone marrow (Fig. 4g, h and Supplementary Figs. 20 and 23a–b). Additionally, it had no significant impact on white blood cell counts and body weights (Fig. 4i and Supplementary Fig. 23c). Overall, the above results indicated that compound 2 has potential for the treatment of polycythemia.

Discussion

Macrocycles are a promising class of drugs with certain properties that could address the shortcomings in drug design for small-molecule drugs. Performing structural modification of privileged macrocycles in drug research is an important route for drug design. To deal with macrocycle structural modification more effectively, CycleGPT was proposed based on a progressive transfer learning strategy to transform the macrocycle structural modification into macrocyclic chemical space exploration in this study. For this specific task, the HyperTemp sampling algorithm was designed to improve the diversity and quality of generated macrocycles. The HyperTemp sampling algorithms in conjunction with CycleGPT can achieve structural modification of macrocycles, including macrocyclic scaffold hopping and modification of substituents on the macrocyclic ring, which is in line with the modification ideas of medicinal chemists.

From the molecules generated by CycleGPT-HyperTemp, six compounds were selected for synthesis based on JAK2 activity prediction and molecular docking scores. Subsequent biological tests revealed that compounds 1, 2, and 6 manifested high inhibitory activity against JAK2 at both enzyme and cellular levels, with compounds 2 displaying an improved selectivity profile against 468 kinases compared to Pacritinib and Fedratinib. In addition, compound 2 possessed the potential to treat polycythemia. This prospective case highlights the practicality of CycleGPT combined with HyperTemp sampling to explore macrocyclic chemical space. Moreover, it provides potential macrocyclic drug candidates for further development of JAK2-targeted therapies. It is expected that CycleGPT will significantly enhance macrocyclic drug design as an advanced extension of current computational methods, enabling efficient exploration and modification of emerging candidate compounds.

Methods

Datasets

Bioactive molecules with activity values (IC50, EC50, Kd, Ki) < 1 μM were collected from the ChEMBL19 database. The molecules were converted into canonical SMILES strings using RDKit, with only those no longer than 140 tokens retained. After removing stereochemistry, salts, and duplicate molecules, 365,063 active molecules were eventually obtained as unique SMILES strings.

The macrocycles used for progressive transfer learning mainly came from two databases, ChEMBL and Drugbank20. Only macrocycles with SMILES strings shorter than 140 tokens in length were retained. We integrated and removed duplicate molecules from both datasets, resulting in a total of 19,920 macrocycles.

CycleGPT model

CycleGPT was implemented based on the nanoGPT architecture (https://github.com/karpathy/nanoGPT), which utilized torch for constructing the GPT network. In CycleGPT, we used the auto-regressive approach to generate macrocycle SMILES. For a sequence \({{\rm{x}}}=({x}_{1},{x}_{2},{x}_{3}{..}.{{\rm{x}}}_{t})\), the auto-regressive model decomposes the joint probability distribution through the chain rule:

$${p}_{\theta }({{\rm{{X}}}})=\mathop{\prod }\limits_{t=1}^{T}{p}_{\theta }({x}_{t}|{x}_{ < t}),$$
(1)

where \({x}_{t}\) represents the vector of the t th token, \(\theta\) represents the model parameters. When training the auto-regressive model, we used the cross-entropy loss function to minimize and optimize the model parameters \(\theta\).

CycleGPT used 12 blocks, with the hidden dimension set to 768. The input and generated SMILES token length was set to 140, and the embedding vector size of each token was set to 768. It employed the Causal Self-Attention mechanism with attention heads set to 12. These attention layers are interconnected and projected into the final output. The scaled-dot attention layer uses three matrices as inputs: a matrix \({{\rm{Q}}}\) containing a set of queries, a matrix \({{\rm{K}}}\) containing keys, and a matrix \({{\rm{V}}}\) containing values. The attention calculation is as follows:

$${{\rm{attention}}}({{\bf{Q}}},{{\bf{K}}},{{\bf{V}}})={{\rm{softmax}}}\left(\frac{{{\bf{Q}}}\cdot {{{\bf{K}}}}^{{{\bf{T}}}}}{\sqrt{{d}_{k}}}\right){{\bf{V}}},$$
(2)

where \({d}_{k}\) is a scaling factor determined by the size of the weight matrices. We employed the optimizer Lion21, developed by Google in 2023, to optimize parameters.

Sampling strategy

The sampling strategies of language models play a key role in affecting the generative quality. There are many popular sampling schemes in the literature, including MaxEntropy sampling, Random mask sampling, Top-K sampling, Noised top-K sampling, Top-p sampling, Tempered sampling, and Tempered top-K sampling40. However, these sampling schemes were mostly designed for generating text sequences. When they are used directly for generating SMILE sequences, the quality of the generated molecules remains to be improved, especially in terms of the balance between the validity and the novelty of the molecules (see Supplementary Table 3).

We designed a sampling algorithm named HyperTemp for generating high-quality SMILE sequences. The main idea of HyperTemp is to introduce a nonlinear transform to adaptively modulate the output probabilities for each token as computed by the language model, and then apply multinomial distribution sampling on the rectified probabilities. Specifically, the nonlinear transform function is chosen as the hyperbolic tangent (tanh) function, defined as:

$${\hat{p}}_{i}=\frac{{p^{\prime} }_{i}}{{\sum }_{j=1}^{|N|}{p^{\prime} }_{j}},\,s.t.{p^{\prime} }_{i}=\,\tanh \left(\frac{\exp (\log ({p}_{i})/T)}{{\sum }_{j=1}^{|N|}\exp (\log ({p}_{j})/T)}\right),$$
(3)

where \(N\) is vocabulary size, \({p}_{i}\) represents the output probability of the ith token in the vocabulary, and \(T\) is adjustable temperature. Then the rectified probabilities with HyperTemp are used for generating the next token.

The HyperTemp sampling strategy employs a nonlinear transformation to recalibrate token probabilities. This transformation moderately attenuates the top-ranked token probability while proportionally enhancing suboptimal token probabilities, thereby expanding the exploration of molecular candidates at each step. Crucially, the tanh function preserves the original relative ranking of tokens due to its monotonicity, ensuring that sequence generation adheres to chemically valid and syntactically regular patterns. By dynamically balancing exploitation of high-probability tokens with exploration of alternative pathways, HyperTemp achieves superior equilibrium between molecular novelty and structural validity compared to conventional sampling approaches (Supplementary Table 1).

It is noteworthy that increasing the temperature during the sampling process can enhance the novelty of the generated samples, as is widely used in the AI literature. However, this adjustment is restricted to a narrow range since excessively high temperatures substantially compromise the validity (or normality) of the generated samples. Therefore, within the desired temperature range, the capacity of elevated temperatures to improve the sampling novelty remains restricted. In comparison, the proposed Hyper-Temp sampling scheme offers a highly robust scheme to achieve the balance between novelty and validity. Empirical results demonstrate its efficacy in increasing the attention to suboptimal tokens to improve the novelty of generation (Fig. 2a–c and Supplementary Table 1)

JAK2 activity prediction

CyclePred was implemented based on MacFrag41 and PharmHGT42, which were used to predict the JAK2 activity of generated macrocycles for further screening. Training data were the experimental IC50 values of 8157 JAK2 inhibitors from PubChem43, ChEMBL19, and BindingDB44. CyclePred uses MacFrag to segment the molecules into the smallest building blocks, combine atomic information with MacFrag building blocks information to construct a heterogeneous graph of the macrocycle, and then perform multi-view message passing block and reading out block.

Model evaluation metrics

The performances of CycleGPT used some metrics widely used in the previous work of molecule generation22, including the following:

Validity refers to the percentage of generated valid molecules in the sampled molecules.

Macrocycle_ratio evaluates the proportion of compounds that are macrocycles in the generated molecules.

Novel_unique_macrocycles refers to the percentage of macrocycles that are unique and not present in the dataset in the generated molecules (remove duplicate macrocycles). It is a comprehensive metric that considers the novelty, uniqueness, and validity (macrocycle) of generated molecules.

In addition, we evaluated the properties in MOSES, including internal diversity (IntDiv, IntDiv2), Fréchet ChemNet Distance (FCD), Similarity to the nearest neighbor (SNN), Fragment similarity (Frag), and Scaffold similarity (Scaff), together with molecular weight, octanol-water partition coefficient (log P), quantitative estimation of drug-likeness (QED), and synthetic accessibility score (SA).

The performance of CyclePred used metrics include: Root Mean Square Error (RMSE) and R-squared (R2), commonly used dimensionless metrics in machine learning and deep learning for regression models. We have chosen to retain four decimal places to maintain the precision and consistency of our results.

Molecular docking

The crystal structure of JAK2 bound to Fedratinib (PDB code 6VNE) was selected as the reference structure for our docking simulations. Maestro and Rosetta were used to evaluate the potential of generated macrocycles against JAK2.

In Maestro docking, we prepared the protein using the Protein Preparation Wizard in Maestro v11.5. A grid-enclosing box was placed at the center of the crystal ligand, and the van der Waals radius scaling factor was set to 0.8, with partial atomic charges less than 0.15, to soften the non-polar parts of the receptor. The three-dimensional structure of the compound was generated and minimized using the Ligprep v3.3 module. The molecules were docked to the binding site using the Glide standard precision (SP) method with default parameters, and only the top pose of each molecule was retained. The binding modes of the docking were calculated using Prime MM-GBSA to calculate the ligand binding energy.

In Rosetta docking, we initially generated a set of low-energy conformations of the macrocycles using Maestro. The molfile_to_params.py script of Rosetta was employed to process the conformation file and subsequently incorporate the small molecule into the protein file. Finally, docking was carried out using the rosetta_scripts function.

Enzyme assay

Recombinant proteins of JAK1/2/3 and TYK2 were produced by the baculovirus expression system and JAKs kinase activity were measured using the Z′-LYTE™ Kinase Assay Platform45. In short, the reaction mixture contains 5 μL enzyme and Z′-LYTE Try6/3 peptide substrate (4 μM), 2.5 μL ATP (10 μM for JAK2, 25 μM for JAK2, 20 μM for JAK3, and 25 μM for TYK2), which were preincubated with various concentrations of tested compounds at room temperature for 1 h, then the development reagent was added to each well. After incubating at room temperature for another hour, a stop reagent was added to stop the reaction. Fluorescence was measured under 400 nm excitation and 445/520 nm emission. The IC50 value was calculated using a sigmoidal curve fit by GraphPad Prism 8.0.

Cell proliferation assay

HEL 92.1.7 and SET-2 cells were seeded at 5000 cells/well in 70 μL RPMI medium with 10% FBS to the 96-well plate and incubated overnight at 37 °C with 5% CO2. 30 μL tested compounds were added from an initial concentration of 25 μM with a 3-fold gradient dilution. After 72 h, 10 μL of Cell Counting Assay Kit-8 solution were added to each well and incubated for 2 h at 37 °C. The absorbance was measured at 450 nm using a microplate reader with a reference wavelength of 630 nm. All experiments were repeated in triplicate, and these data were plotted in Graphpad 8.0 to determine the half-maximal inhibition (IC50) values.

Western blot analysis

HEL 92.1.7 cells were added into a 6-well plate (1 × 106 cells/well) and then put into the incubator. After overnight growth, cells were treated with or without compound 2 for 4 h. Then cells were collected and lysed for protein concentration determination. For immunoblotting, the protein samples were isolated by SDS-PAGE and transferred to a PVDF membrane. The membrane was sealed with 5% skim milk in TBST for 2 h at room temperature, and incubated with the p-JAK2, p-STAT5, p-STAT3, p-AKT, p-ERK, JAK2, STAT5, STAT3, ERK, AKT, or β-actin primary antibody overnight at 4 °C. Then the membrane was washed with TBST and incubated with HRP-conjugated secondary antibody at room temperature for 2 h, and then exposed by chemiluminescence method using enhanced ECL immunoblotting system (Tanon, China). All experiments were repeated in triplicate and analyzed using Image J and Graphpad 8.0.

In vivo efficacy study

Female BALB/c mice aged 8–10 weeks (17–20 g body weight) were subcutaneously injected with 10 units of recombinant human erythropoietin (rhEPO) daily for 4 consecutive days, and orally administered either vehicle or compounds. Controls were injected with corresponding volumes of saline buffer. On day 5, mice were euthanized within 24 h after the last administration. Blood was collected into microtubes containing EDTA-K2 through orbital blood collection for routine blood test by Idexx ProCyte Dx® analyzer. After the spleens and bone marrows were isolated, they were placed in 1× PBS on ice for subsequent test. For flow cytometric analysis, the spleens and bone marrows were homogenized and filtered with a 70 µm cell strainer to isolate single cells and lysed red blood cells with RBC Lysis Buffer, washed in 1× PBS and Fc receptors were blocked by TruStain FcX™ (anti-mouse CD16/32) to reduce non-specific staining. Cells were stained by APC-conjugated Ter119 and PE-conjugated CD71 antibody in the dark at room temperature for 30 min. After washing twice, cells were resuspended in the Cell Staining Buffer and analyzed by flow cytometry. Data were analyzed using GraphPad Prism software. The results of in vivo experiments are expressed as mean ± SEM. Means were used for statistical comparisons between groups (one-way ANOVA followed by Dunnett’s test). P < 0.05 was considered to indicate a statistically significant difference. All animal experiments were conducted with the Guide for the Care and Use of Laboratory Animals of the National Institutes of Health and approved by the Animal Ethics Committee of East China University of Science and Technology (Permit Number: ECUST‑2022‑035).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.