Introduction

Alzheimer’s disease is one of the most severe conditions affecting older people’s brains and is rapidly growing in importance as a public health issue1. The worldwide impact of Alzheimer’s AD is projected to rise significantly, increasing from 26.6 million cases in 2006 to an estimated 106.8 million by 20502,3. It is a progressive neurodegenerative disorder characterized by the accumulation of β-amyloid plaques and tau-containing neurofibrillary tangles in the brain4,5,6. These pathological changes disrupt synaptic homeostasis and interfere with critical endosomal and lysosomal clearance pathways, contributing to cognitive impairment7.

The pathology of AD is marked by neuronal loss and atrophy, particularly in regions crucial for memory and cognition7. The hippocampus, amygdala, entorhinal cortex, and cortical association areas of the frontal, temporal, and parietal lobes are among the most affected regions. The trans-entorhinal cortex is where tangles first form, followed by the entorhinal cortex, the hippocampal CA1 region, and finally the cortical association areas, which are primarily damaged in the frontal, parietal, and temporal lobes8. There is a strong correlation between tau protein accumulation and brain atrophy, notably hippocampus atrophy, and cognitive deterioration. This loss of neurons and atrophy in the temporal frontal cortex not only contributes to inflammation but also promotes the deposition of amyloid plaques and abnormal clusters of protein fragments9,10,11. These pathological changes activate microglial cells and increase the presence of monocytes and macrophages in the cerebral cortex, further exacerbating the disease progression12,13,14.

DYRK1A is a proline-directed protein kinase encoded by a single gene located at the 21q22.2 locus of chromosome 2113,15. This is significant because individuals with Down syndrome, who carry an extra copy of chromosome 21, often develop Alzheimer’s disease-like neuropathology, including tau tangles and amyloid plaques, at an early age16. DYRK1A is known to phosphorylate tau at multiple sites, including serine 202, threonine 212, and serine 404, all of which are hyperphosphorylated in AD brains. Additionally, DYRK1A facilitates tau alternative splicing, influencing cellular processes such as apoptosis and neurodegeneration13. Increased DYRK1A immunoreactivity has been reported in the cytoplasm and nuclei of scattered neurons in the entorhinal cortex, hippocampus, and neocortex in neurodegenerative diseases associated with tau phosphorylation, including Alzheimer’s disease17,18,19. This aberrant overexpression of DYRK1A underscores its role in promoting tau pathology and highlights its involvement in the progression of AD16.

Beyond its effects on tau, DYRK1A phosphorylates other substrates, such as the transcription factor CREB, which is crucial for learning and memory20. Dysregulation of DYRK1A in AD brains has been linked to elevated mRNA levels, suggesting its overexpression contributes to disease progression21. Moreover, DYRK1A phosphorylates amyloid precursor protein (APP), implicating it in the production and accumulation of β-amyloid peptides22.

Computer-aided drug design tools have accelerated the drug discovery process and decreased the cost of the enlightenment process. Combinatorial library creation, molecular docking, virtual high throughput screening, QSAR, QSPR, comparative modeling, ADME/Tox research, and other techniques are part of the contemporary drug development process. Nowadays, when designing and developing new drugs, pharmacokinetic and pharmacodynamic features are optimized using QSAR analysis23,24. QSAR plays a vital role in drug design by using statistical models to correlate the biological activity of compounds with their molecular properties. This approach provides medicinal chemists with a powerful tool to explore chemical space, predict the bioactivity of potential drug candidates, and optimize lead compounds25,26,27,28.

Materials and methods

Compound library retrieval

A total of 192 compounds were retrieved from the SuperNatural 3.0 database (Table 1) based on their molecular weight, LogP, HBA, and HBD values (https://bioinf-applied.charite.de/supernatural_3/subpages/compounds.php). The compounds were selected based on the parameter ranges of 250–500 for molecular weight, 2.5–4.0 for LogP, 5–9 for HBA, and 2–5 for HBD. Additionally, abemaciclib was included as a control drug. Furthermore, 1,700 drug compounds were sourced from the ChEMBL database, selected based on their IC50 values, for the development of a QSAR model.

Retrieval and preparation of protein target

The X-ray crystallized structure of the DYRK1A kinase in humans (PDB ID: 7O7K), the target protein for this study, was obtained from the Protein Data Bank (PDB) (https://www.rcsb.org/). This structure has a resolution of 1.82 Å and consists of 361 amino acid residues (Fig. 1). Protein preparation was carried out using Maestro’s protein preparation wizard workflow (Version 12.5). The process involved refinement through optimization and minimization using the OPLS3e force field. Heteroatoms and water molecules were removed, while omitted residues, loops, and side chains were resolved using the Maestro Prime module. Hydrogens, including non-polar hydrogens, were added, and gasteiger charges were assigned to the structure. Partial charges were distributed on the missing atoms of amino acid residues to ensure charge consistency within the system.

Preparation of database-curated compounds and control

The 2D conformers of the database-curated compounds and the control drug (abemaciclib) were retrieved in structure data file (SDF) format from PubChem (https://pubchem.ncbi.nlm.nih.gov/compound/), a leading repository of free-access chemical compound data associated with the National Center for Biotechnology Information (NCBI), and COCONUT (https://coconut.naturalproducts.net/), an open collection of natural products. Abemaciclib was used as the reference drug because it has been co-crystallized with DYRK1A (PDB ID: 7O7K), which provided a validated structural template for the docking studies. In addition, it has been reported that Abemaciclib inhibits DYRK1A to the same extent as CDK4 and CDK615. Subsequently, the LigPrep tool in Maestro 12.5 was used to convert the 192 compounds and the control drug into their most stable and energetically favorable conformations, using the OPLS3e force field.

Receptor grid generation, virtual screening, and molecular docking

Using the receptor grid generation (RGG) Glide tool in Maestro 12.5, a grid box was created around the binding site of the processed protein. This grid covers the binding site where the protein’s co-crystallized ligand was located, showing the interaction region between the ligand and the receptor. A cubic grid box covering all amino acid residues at the site was generated with coordinates: x = 72, y = 72, and z = 72 Å. The prepared compound library was then subjected to virtual screening using the Ligand Docking Glide tool in Maestro. The compounds were docked with the target protein to identify those with the best binding affinity. Ligand sampling was adjusted from rigid to flexible for docking. Initially, the high-throughput virtual screening (HTVS) algorithm screened the compound library against the DYRK1A active site. The top 164 compounds with the lowest HTVS docking scores and the control drug were further screened using the standard precision (SP) algorithm. Then, the extra precision (XP) algorithm was used to rescreen the top 52 compounds with SP docking scores of ≥ −8.0 alongside the control drug. After molecular docking, the binding free energies of the protein- ligand complexes were calculated using the Prime Molecular Mechanics with Generalized Born and Surface Area (MMGBSA) post-docking protocol, employing the variable dielectric Solvent Generalized Born (VSGB) solvation model and the OPLS3e force field29.

ADMET prediction

The adsorption, distribution, metabolism, excretion, and toxicity (ADMET) properties of the compounds were predicted using AI Drug servers (https://ai-druglab.smu.edu/)30. This analysis, crucial in early drug discovery, assessed the pharmacokinetics of the potential drug candidates. The SMILES strings of the compounds retrieved from PubChem and COCONUT were inputted into the server for the pharmacokinetic behaviour prediction.

In silico prediction of biological activity

A machine learning-based Quantitative Structure-Activity Relationship (QSAR) analysis was conducted to predict the biological activity of the hit compounds and control based on their IC50 values. This analysis was aimed to enhance docking, ADMET, and molecular dynamic analyses by providing a quantitative assessment of biological activity (pIC₅₀) derived from molecular descriptors. Unlike docking which assesses binding affinity, QSAR shows structure–activity relationships and identifies the key physicochemical characteristics that control DYRK1A inhibition. It guarantees that the selected hit compounds are not only potent binders but also structurally aligned with known inhibitory patterns. Experimental inhibitors of DYRK1A and their logarithmic biological activity values (pIC50) were obtained from the ChEMBL database (https://www.ebi.ac.uk/chembl/) using the protein’s amino acid sequence. The autoQSAR tool in Schrodinger Maestro 12.5 was used to extract molecular descriptors from the retrieved dataset of 525 compounds. The molecular descriptors used include physicochemical properties such as molecular weight, LogP, polar surface area; partial charges and hydrogen bond donors and acceptors. Higher lipophilicity (LogP) and lower polar surface area are associated with higher pIC5031. Charge distribution such as hydrogen bond donors and acceptors also correlate positively with the bioactivity, as it shows the importance of hinge-region hydrogen bonding in kinase–ligand interactions32. The dataset was divided into a 75% training set (393 compounds) and a 25% test set (132 compounds) to construct and evaluate statistical models. The training set was used to develop QSAR models correlating the molecular descriptors of the compounds with their pIC50 values. The IC50 values were transformed into pIC50 (–log10IC50) to normalise their skewed distribution and to linearise the relationship between the descriptors and activity values. Model performance was assessed using metrics such as the correlation coefficient (R²), cross-validated determination coefficient (Q²), root mean square error (RMSE), standard deviation (SD), and ranking score. The optimal QSAR model was identified and subsequently used to predict the pIC50 values of the final set of compounds.

Molecular dynamic simulations

The molecular dynamics (MD) simulations of the protein- ligand complexes were studied using the Desmond package in Maestro to evaluate their stability under physiological conditions33. The selected complex was initially solvated in a water box using the single point charge (SPC), extending 10 Å beyond the complex atoms. To ensure system neutrality, counter ions (Na+, Cl–) were added at a concentration of 0.15 M. The particle mesh Ewald (PME) method, as described by Kumar and Higdon34, was applied to account for all electrostatic interactions, with a Lennard-Jones cutoff of 10 Å. Additionally, the SHAKE algorithm was used to constrain hydrogen atoms and covalent bond movements, while periodic boundary conditions were implemented by standard simulation protocols35,36.

To prepare the system for simulations, energy minimization was performed using the steepest descent algorithm for 2000 steps, using the LBFGS (Limited-memory Broyden-Fletcher-Goldfarb-Shanno) method. This algorithm was chosen for its efficiency in optimizing large systems with numerous degrees of freedom. Also, it accommodates periodic boundary conditions. By simulating a smaller portion of the system, it effectively captured long-range interactions characteristic of bulk materials. Subsequently, a 100 ps simulation was executed under the NVT ensemble at 300 K, followed by another 100 ps simulation in the NPT ensemble at the same temperature to stabilize non-hydrogen solute atoms. During these initial stages, pressure and temperature were regulated using Barostats and Berendsen thermostats37. For the final production run, a Nosé-Hoover thermostat was maintained at 300 K, while a Martyna-Tobias-Klein barostat was set to 1 bar pressure38. The production simulation was carried out for 200 ns with a time step of 2 fs. To analyze the resulting MD trajectory, multiple parameters were assessed, including root mean square deviation (RMSD), root mean square fluctuation (RMSF), contact maps, and interaction profiles.

Free energy calculation

The binding free energy of the protein- ligand complex was computed using the gmx_mmPBSA tool, as described by Genheden & Ryde and Valdés-Tresanco et al.39,40. The molecular dynamics (MD) simulation trajectories obtained from the Desmond module, which included explicit water molecules, were processed to generate a Gromacs trajectory file required for free energy estimation using Schrödinger scripts. The topology files for both the protein and ligand were generated separately by converting *.cms files to *.gro and *.top formats using the InterMol software. The methodology used estimates the binding free energy (∆G_bind) of non-covalently bound complexes based on the following equation, as previously described by Thapa & Raghavachari41:

$$\Delta G_{{bind}} = \left\langle {G_{{COM}} } \right\rangle - \left\langle {G_{{REC}} } \right\rangle - \left\langle {G_{{LIG}} } \right\rangle$$
(1)

where GCOM, GREC, and GLIG represent the energy contributions of the complex, receptor, and ligand, respectively. This can be further expanded as:

$$\Delta G_{X} = \left\langle {E_{{MM}} } \right\rangle + \left\langle {G_{{SOL}} } \right\rangle - \left\langle {TS} \right\rangle$$
(2)

Similarly, the binding free energy can be depicted by:

$$\Delta G_{{Bind}} = \Delta H - T\Delta S$$
(3)

Where ∆H represents the binding enthalpy and − T∆S corresponds to the entropy change before ligand binding. The entropic contribution can often be omitted, meaning the computed value primarily represents the effective free energy, which is typically sufficient for comparing relative binding free energies of different ligands. The binding enthalpy ∆H can be decomposed into the following components:

$$\Delta H{\text{ }} = {\text{ }}\Delta E_{{MM}} + \Delta G_{{SOL}}$$
(4)

Where:

$$\Delta E_{{MM}} ~ = {\text{ }}\Delta E_{{bonded}} + \Delta E_{{nobonds}} ~ = {\text{ }}\left( {\Delta E_{{bond}} + \Delta E_{{angle}} + \Delta E_{{dihedral}} } \right) + \left( {\Delta E_{{ele}} + \Delta E_{{ydW}} } \right)$$
(5)

And:

$$\Delta G_{{SOL}} = \Delta G_{{pol}} + \Delta G_{{non - pol}} = \Delta G_{{PB/GB}} + \Delta G_{{non - pol}}$$
(6)

Results

This study aims to predict the bioactivity of drug compounds capable of inhibiting the activity of DYRK1A, an enzyme involved in the oligomerization of tau protein in the pathophysiology of Alzheimer’s disease, using a QSAR model.

Molecular docking

The final XP docking results revealed that the compounds demonstrated high binding affinities for the DYRK1A protein, as reflected by their docking scores, which ranged from − 10.19 kcal/mol to −13.337 kcal/mol (Table 1). Among them, compound 45,934,388 exhibited the strongest binding affinity with the lowest docking score of −13.337 kcal/mol. This is followed by compounds CNP0344929, CNP0360040, CNP0309850 and CNP0426983 with docking scores of −12.746, −11.712, −11.656, and − 11.416 kcal/mol respectively. In comparison, the control drug, 46,220,502, displayed the highest docking score of −6.966 kcal/mol, indicating relatively lower binding affinity. Other compounds, including CNP0289420, and CNP0073875, also showed notable binding affinities with docking scores of −10.371 kcal/mol and − 10.3 kcal/mol, respectively. Binding energy values for the remaining screened compounds are provided in Supplementary Table S4.

Table 1 Docking scores and the MMGBSA binding free energy of compounds against DYRK1A using the glide module in maestro 12.5.

3D binding pose

The 3D interaction of the top five hit compounds having the best binding affinity, alongside the control and the DYRK1A protein, is shown in Fig. 1. The binding pose shows that the compounds occupy a similar site in the active region of DYRK1A. Based on their binding poses and affinity scores (kcal/mol), compounds 45,934,388, CNP0344929, CNP0360040, CNP0309850, and CNP0426983 were selected for further assessment of 2D binding interaction and pharmacokinetic properties prediction.

2D binding profile analysis

The examination of the 2D interaction profiles further showed that the active compounds primarily participated in hydrophobic interactions with the residues of DYRK1A (depicted in Fig. 2). Compounds 45,934,388 and CNP0344929 exhibited hydrophobic interactions with residues including Met240, Glu239, Phe238, Ala186, Val222, Val306, Lys188, Leu294, Val173, and Ile165. However, 45,934,388 interacted with Gly166, while CNP0344929 interacted with Gly168 and Tyr243, which were absent in 45,934,388. Compound 45,934,388 formed hydrogen bonds with Leu241, Asp307, Asn292, Glu291, and Asn244, whereas CNP0344929 formed hydrogen bonds with Leu241, Ile165, Lys167, Glu291, and Glu239.

Compound CNP0360040 demonstrated hydrophobic interactions with a broader range of residues, including Ser310, Phe308, Val306, Asn292, Leu294, Glu203, Ile165, Gly166, Gly168, Lys188, Val173, Ala186, Phe238, Glu239, Val222, and Met240. It also formed hydrogen bonds with Asp287, Lys289, Asp307, Lys167, and Leu241, alongside a pi-pi stacking interaction with Phe170. Compound CNP0309850 exhibited hydrophobic interactions with Lys289, Asn292, Leu294, Val173, Phe170, Lys167, Gly166, Ile165, Phe238, Glu239, Val222, Met240, Tyr243, Asn244, and Asp247. It also formed hydrogen bonds with Asp307, Lys188, Leu241, and Ser242.

Compound CNP0426983 participated in hydrophobic interactions with residues such as Leu294, Glu291, Asn244, Tyr243, Ser242, Met240, Glu239, Phe238, Leu236, Val222, Leu207, Val306, Phe308, Glu203, Ala186, Val173, Ile165, and Gly166. It formed hydrogen bonds with Leu241, Asp307, and Lys188. The control compound, 46,220,502, exhibited hydrophobic interactions with residues Leu294, Glu291, Val306, Asp307, Asp247, Lys188, Ala186, Asn244, Tyr243, Ser242, Leu241, Met240, Glu239, Phe238, Val173, Phe170, Val222, Lys167, and Gly166. It formed hydrogen bonds with Ile165 and Asn292.

Pharmacokinetics evaluation

The top five hit compounds with the lowest docking scores, along with the control drug (abemaciclib), were assessed for pharmacokinetic properties based on Lipinski’s rule of five (Ro5): molecular weight (MW) ≤ 500 Daltons, LogP (lipophilicity) ≤ 5, hydrogen bond donors (HBD) ≤ 5, and hydrogen bond acceptors (HBA) ≤ 10. The ADMET evaluation results of the lead compounds are presented in Table 2. None of the compounds violated the Ro5, while the control drug had one violation, with a molecular weight of 506.27, slightly exceeding the 500-dalton threshold.

All lead compounds were predicted to have moderate human intestinal absorption but were not effective Caco-2 permeants, as their scores slightly surpassed the optimal value threshold, except for CNP0360040 and the control drug, 46,220,502. The compounds were identified as moderate inhibitors of P-glycoprotein, indicating limited potential for drug-drug interactions. Additionally, all compounds except CNP0426983 were predicted to exhibit optimal blood-brain barrier (BBB) permeability, an essential feature for brain-targeting in Parkinson’s disease (PD) treatment.

In terms of Phase I metabolism, the compounds were potent inhibitors of CYP2D6 but moderate inhibitors of CYP2C9 and CYP3A4. They were also identified as substrates for these enzymes, suggesting minimal inhibitory effects. The compounds demonstrated moderate inhibition of the human ether-a-go-go-related gene (hERG), indicating a lower risk of cardiac arrhythmia. Furthermore, the compounds were predicted to exhibit moderate mutagenicity and a reduced likelihood of inducing liver injury. Overall, the lead compounds showed favourable ADMET profiles, making them promising candidates for further drug development. The ADMET profile values of other screened compounds are presented in supplementary Table S5. Given their predicted BBB permeability, all hit compounds and the control drug were subjected to molecular dynamics (MD) simulations.

Table 2 ADMET prediction of abemaciclib and the hit compounds with good binding against DYRK1A using the AI drug lab (https://ai-druglab.smu.edu/).

Predicted biological activities of hit compounds

Ten models were generated to assess the biological activities of the compounds, with the best-performing model selected for further analysis. The performance metrics of all developed models are provided in Supplementary Table S6. The models achieved significant differences in prediction accuracy, with R² values spanning from 0.40 to 0.72 and Q² values ranging from 0.44 to 0.55. The kpls_radial_4 model attained a high R² (0.6947), but exhibited weaker predictive robustness (Q² = 0.5512) compared to other models, whilst the kpls_radial_46 model had a strong fit (R² = 0.7179) but received a lower ranking score (0.4369). The Kpls_molprint2D_49 model, derived from kernel-based partial least squares regression (KPLS), demonstrated the best predictive performance with R² = 0.5245 and Q² = 0.5228, RMSE = 0.7566 in external validation, a standard deviation of 0.7640 in internal validation and a ranking score of 0.5263. These values, although moderate, fall within the acceptable thresholds for generally accepted predictive QSAR models42. Based on this, the Kpls_molprint2D_49 model was selected for further predictions of the hit compounds’ pIC₅₀ values.

Figure 3 presents a scatter plot illustrating the model’s performance in predicting the pIC50 values for the training and test sets. As shown in Table 3, the control drug exhibited the highest pIC50 value, while the hit compounds displayed pIC50 values comparable to that of the control drug. Predicted activities varied from pIC₅₀ 5.75 to 6.16 for the novel compounds, compared with 6.32 for 46,220,502 (control). Compound 45,934,388 showed the highest predicted activity (pIC₅₀ = 6.16), which is very close to the reference drug, whereas CNP0360040 exhibited the lowest anticipated activity (pIC₅₀ = 5.75). Descriptor analysis showed that most compounds had LogP values between 2.8 and 3.9. PSA values varied from 96.2 to 152 Ų, with 46,220,502 (control) having the lowest at 75 Ų.

Table 3 Chemical structures, predicted pIC50 values, and QSAR characteristics of the top five candidate DYRK1A inhibitors and the reference compound, abemaciclib.

Molecular dynamics simulation

Protein and ligand RMSDs

The RMSD is a fundamental measure used to assess the stability and flexibility of molecular systems during simulations43,44. For proteins, RMSD quantifies the displacement of backbone atoms from their initial positions, providing valuable insights into changes in structure and overall stability. Ligand RMSD, on the other hand, tracks the movement of the ligand relative to the protein, reflecting the degree of stability in the ligand’s binding interaction. When both protein and ligand RMSD are analyzed together, they offer a comprehensive view of the dynamic nature of protein-ligand interactions. In this study, the interactions between DYRK1A kinase and six different ligands—45,934,388, CNP0344929, CNP0360040, CNP0309850, CNP0426983, and 46,220,502 (used as a control)—were investigated through molecular dynamics simulations. Each ligand exhibited unique RMSD trends (Fig. 4), influenced by factors such as the ligand’s binding affinity and the structural flexibility of DYRK1A kinase.

The graph in Fig. 4A illustrates the RMSDs of both the DYRK1A kinase protein and its ligand, 45,934,388, over a 200-ns molecular dynamics simulation. The protein RMSD curve oscillates around a relatively stable mean value (approximately 1.5 to 2.0 Å) after an initial equilibration period. This stability indicates that the protein has settled into a stable conformation after the initial phase of the simulation. In contrast, the ligand’s RMSD shows more fluctuation compared to the protein, ranging roughly from 1.0 to 3.5 Å throughout the simulation. Such variability may indicate that the ligand experiences more conformational changes or adjustments as it interacts with the protein. There appears to be a weak correlation between the ligand and protein RMSD. Large changes in ligand RMSD do not consistently correspond to changes in protein RMSD. This may suggest that while the ligand’s conformational dynamics affect the overall interaction, they are not tightly coupled.

The protein RMSD in Fig. 4B fluctuates between approximately 0.3 Å and 2.7 Å, with an average that appears to hover around 1.75 Å. This is a relatively low RMSD range, indicating that the protein maintains a stable conformation overall. The significant increase in ligand RMSD values, particularly peaking at 8 Å during the first 55 ns, raises concerns about the stability of the ligand’s binding. This fluctuation may indicate that the ligand is experiencing various binding poses or an unstable interaction with the protein. However, for the majority of the simulation, the ligand remains stable within 5 Å. Notably, after 175 ns, the ligand RMSD decreases to approximately 3.5 Å, suggesting that CNP0344929 begins to align more closely with its initial binding conformation.

In further detail, the protein RMSD in Fig. 4C shows an initial increase and fluctuation during the first half of the simulation, which may indicate a phase of structural adjustment as the protein encounters its surroundings. After this initial period, the protein RMSD settles into a stable range between approximately 1.0 Å and 2.0 Å for the remainder of the simulation. This stabilization highlights that the protein maintains a relatively consistent conformation over time, suggesting that the simulation successfully allows the protein’s structure to equilibrate effectively. Conversely, the ligand RMSD starts at around 0.8 Å and demonstrates relative stability in the range of 1.0–3.2 Å between 25 and 50 ns. This indicates that the ligand initially adopts a stable configuration within the binding pocket. However, as the simulation progresses, particularly after approximately 50 ns, the ligand RMSD shows a slight reduction, fluctuating around 1.6 Å. It indicates that the ligand experiences some minor conformational adjustments during this period. Following this fluctuation, the ligand’s RMSD increases again, reaching up to 2.8 Å, and then stabilizes from about 100 ns until the end of the simulation at 200 ns. The overall behavior of the ligand during the second half of the simulation is characterized by a stabilization phase despite the earlier fluctuations, indicating that the ligand may have found a more favorable binding conformation or is effectively interacting with the protein in a more settled state.

In Fig. 4D, after the initial phase, the protein RMSD stabilizes and maintains consistent values, mostly around 1.0 Å to 1.5 Å, indicating that the protein has settled into a stable conformation. The ligand RMSD, while demonstrating smaller fluctuations, exhibits more dynamic behavior. It maintains a level of stability between 1.0 Å and 1.5 Å for a considerable duration but also showcases brief spikes (reaching up to 2.25 Å), suggesting it may explore different conformations within the binding pocket. The plot illustrates that while both curves move in rhythm at times, they exhibit distinct behaviors overall, with the ligand showing greater variability.

In the case of the CNP0426983 complex (Fig. 4E), the protein RMSD initially exhibits high fluctuation, remaining between 1.5 Å and 2.8 Å during the first 80 ns of the simulation. After 80 ns, the RMSD shows a noticeable reduction in fluctuation, reaching a more stable state around 2.5 Å, indicating that the protein maintains a more consistent structural conformation for the rest of the simulation. This plateau-like behavior signifies that the protein has settled into a stable conformation after initial structural adjustments. The ligand RMSD shows a different trend, with an initial rise during the first 10 ns, followed by a period of stability until about 50 ns. However, a peak appears around 55 ns (reaching up to 2.8 Å), suggesting a transient conformational shift or a more significant interaction with the protein. After this peak, the ligand RMSD gradually decreases and reaches a plateau around 63 ns, though with considerable fluctuation until the 160 ns mark. This indicates that the ligand is still undergoing significant structural adjustments or is in a dynamic equilibrium with the protein. After 160 ns, the ligand RMSD decreases and stabilizes with much less fluctuation (around 3.2 Å), indicating that the ligand has found a more stable binding conformation within the protein’s active site by the end of the simulation.

Finally, for 46,220,502 (control)-bound complex (Fig. 4F), the protein RMSD (blue) shows a clear progression over time. Initially, the RMSD increases from approximately 1.2 Å to 2.4 Å within the first 10 ns, suggesting a period of equilibration where the protein undergoes structural adjustments to stabilize in its environment. Between 10 ns and 25 ns, the RMSD slightly decreases to about 1.9 Å, indicating a brief phase of reduced flexibility as the protein begins to settle. Afterward, the RMSD begins to rise again, reaching a relatively stable plateau until around the 120-ns mark. Beyond this point, the RMSD gradually increases, reaching approximately 3.0 Å. This gradual rise reflects ongoing conformational adjustments as the protein moves toward a stable state, adapting to its surrounding environment and potentially the ligand binding. A similar trend is observed in the ligand RMSD, where it tracks the protein’s movements but with a notable difference: around 137 ns, there is a sudden dip in the ligand’s RMSD. This dip could indicate a transient structural change or an adjustment in the ligand’s position within the binding pocket, perhaps in response to changes in the protein structure or to facilitate stronger binding.

Across the six complexes, the RMSD traces demonstrate a consistent ranking of dynamic stability. The control drug showed the most compact protein RMSD profile upon equilibration, consistent with a well-anchored binding mode. Among the hit compounds, CNP0360040 and 45,934,388 display the most favourable ligand behaviour: after initial equilibration their ligand RMSDs often stable below ~ 3 Å for the majority of the production run, which indicates that they endure binding conformations rather than recurrent unbinding events. By comparison, CNP0344929 has the highest early ligand excursions (peaking near ~ 8 Å in the initial 55 ns) and only progressively settles toward reduced RMSD values later in the trajectory. CNP0309850 and CNP0426983 display intermediate patterns with intermittent RMSD spikes but general equilibration. In practical terms, sustained low ligand RMSD after equilibration (as observed for CNP0360040 and 45934388) supports a stable bound pose and higher likelihood of productive engagement with DYRK1A, whereas large early spikes (as for CNP0344929) indicate transient binding or conformational searching that lowers confidence in that hit despite the acceptable docking score.

Protein RMSFs

RMSF serves as an important measure for evaluating the flexibility of various regions within a protein structure45,46. Higher RMSF values indicate greater flexibility, while lower values reflect more stable and rigid regions47,48. Regions such as the N-terminal and C-terminal often show elevated RMSF values, primarily due to their exposure to the solvent and weaker internal interactions49. Similarly, loop regions, which lack the defined structure of α-helices or β-sheets, tend to exhibit higher RMSF values as they require flexibility to perform critical functions like ligand binding or facilitating protein-protein interactions. On the other hand, structured regions like α-helices and β-sheets usually display lower RMSF values, reflecting their more stable and rigid nature50.

For the DYRK1A kinase in complex with six ligands (Figs. 5A-F), RMSF analysis of the binding residues (Fig. 5) reveals relatively low values, typically under 3.0 Å. This indicates that these residues become more stable upon ligand binding, reducing their flexibility. Ligand binding often results in structural rearrangements that stabilize the protein, ensuring a stronger and more specific interaction between the kinase and its ligands. The RMSF profiles complement the RMSD-based ranking by indicating that the most dynamically stable ligands cause the greatest local stability of the binding pocket. In the complexes with 46,220,502 (control), CNP0360040 and 45,934,388, residues in the hinge and ATP pocket region (particularly Glu239, Leu241, Phe238 and adjacent backbone locations) show considerably reduced variations. The lower RMSF in these areas demonstrates how ligand binding induces rigidity at the enzyme’s active site. By contrast, complexes such as CNP0344929 sustain greater flexibility in some pocket-adjacent loops for longer periods of the simulation, consistent with the bigger ligand RMSD excursions noted above.

While the binding residues show reduced RMSF, other parts of the DYRK1A kinase, such as the flexible C- and N-terminal lobes and loop regions, may still show higher RMSF values. These regions remain more flexible as they are not directly involved in ligand binding, and their dynamic nature is important for the protein’s overall functionality and regulation. This distinction between the stabilized binding site and the more flexible regions highlights how ligand binding can alter the protein’s overall structural flexibility, potentially enhancing or inhibiting its activity, depending on the nature of the ligands.

Protein-ligand contacts

Protein-ligand interactions with the DYRK1A kinase were tracked throughout the simulation for six different ligands, classified into hydrogen bonds (green), hydrophobic interactions (purple), ionic interactions (magenta), and water bridges (blue). Figure 6 summarizes these interactions using stacked bar charts, normalized over the simulation trajectory, where values indicate the percentage of simulation time an interaction was maintained. For instance, a value of 0.7 signifies that the interaction persisted for 70% of the simulation, while values exceeding 1.0 (or 100%) denote cases where a single residue forms multiple interactions of the same subtype with the ligand. Additionally, Fig. 7 provides a 2D schematic illustrating detailed ligand atom interactions with DYRK1A kinase residues, displaying only interactions that persist for more than 30% of the simulation time. This diagram highlights key binding features, including cases where residues establish multiple interactions with the same ligand atom, leading to interaction percentages greater than 100%. To streamline the analysis and focus on the most relevant interactions, this section discusses only interactions that meet or exceed the 30% threshold.

The dynamics simulations of 45,934,388 (Figs. 6A and 7A) over 200 ns reveal several key interactions with the DYRK1A kinase that remained consistent throughout the simulation. The formyl C = O forms strong hydrogen bonds with Leu241 (70%), while the 9-OH, which initially bonded with Leu241 during docking, now interacts with Glu239 (94%), highlighting their crucial role in binding stability. Additionally, a moderate water bridge was observed between the 3-OH and Glu291 (33%), suggesting the involvement of water molecules in facilitating these interactions. Hydrophobic contacts with Val173 and Phe238 were also prominent, further reinforcing the binding stability of 45,934,388 throughout the simulation.

For CNP0360040 (Figs. 6B and 7B), the dynamics simulations confirm stable binding with DYRK1A kinase through several key interactions that remained consistent throughout the simulation. A newly formed hydrogen bond between the 4-OH (acting as an H-bond donor) and Glu239 (75%) was observed, replacing the initial docking interaction with Leu241, which was maintained only 19% of the time. This shift highlights the crucial role of Glu239 in the ligand’s binding stability. Additionally, strong hydrophobic interactions with Ala186, Phe238, and Val306 further reinforce the ligand’s overall binding strength.

The dynamics simulations of CNP0360040 (Figs. 6C and 7C) over 200 ns reveal several critical interactions with DYRK1A kinase, including the retention of key interactions from the docking step. Strong hydrogen bonds were maintained between the chromen C = O and Lys188 (96%), as well as between the para-substituted –OH in ring B and Leu241, acting as both an H-bond donor (52%) and acceptor (65%), reinforcing their crucial role in stabilizing the ligand within the binding pocket. Additionally, a new hydrogen bond formed between the 3-OH and Ile265 (38%), replacing its previous interaction with Ser242 observed during docking. Furthermore, newly emerged hydrophobic interactions with Ala186 and Phe238 contributed moderately to the ligand’s overall binding stability throughout the simulation.

For CNP0309850 (Figs. 6D and 7D), the molecular dynamics simulations reveal the retention of crucial interactions initially observed during docking, highlighting the stability of the ligand within the binding site. Specifically, the two hydrogen bonds formed between Leu241 and the 4- and 5-OH groups (for approximately 90% and 42% of the simulation time, respectively) and between the 5-OH group and Glu239 (97%) remain intact. Additionally, the ortho-substituted hydroxyl group maintains a stable hydrogen bond with Glu291 throughout approximately 93% of the simulation time. These interactions, which were also observed during docking, contribute to the overall stability of the ligand in the binding site. Interestingly, although the para-substituted -OH group did not engage in direct hydrogen bonding during the docking process, the dynamics simulations reveal that it forms two water-mediated bridges with Asp287 and Asp307. These interactions are present for roughly 53% of the simulation time, suggesting that the ligand undergoes a dynamic rearrangement during the simulation.

The dynamics simulations of CNP0426983 (Figs. 6E and 7E) reveal the formation of new interactions with DYRK1A kinase, while previously observed interactions from the docking step are lost. Specifically, the hydrogen bonds between the –OH carboxylic group and Leu241, the phenolic –OH with Lys188, and the C = O ketone with Asp307, which were key stabilizing interactions during docking, are no longer present in the dynamics simulations. However, the phenolic –OH group does continue to form a water-mediated bridge with Asp307, maintaining some level of interaction (31%) despite the loss of the direct hydrogen bond. In addition, the ligand exhibits a combination of newly formed interactions, primarily on the substituted sides of the molecule. These include hydrogen bonds, water bridges, and hydrophobic interactions, which are present for varying durations (ranging from 10% to 24% of the simulation time). Furthermore, one pi-stacking interaction between the phenolic ring and Phe170 is observed for approximately 14% of the simulation time.

Unlike the hit compounds, the dynamics simulations of 46,220,502 (Figs. 6F and 7F) reveal the formation of new interactions with DYRK1A kinase, further emphasizing the ligand’s dynamic behavior within the binding site. The H-bond initially observed between the pyridine N and Ile165 during docking undergoes a shift to Leu241 in the simulations. However, this interaction is only retained for 19% of the simulation time, highlighting the transient nature of this particular bond and the overall dynamic flexibility of the ligand-protein interaction. Despite the presence of multiple interactions involving four of its five rings—such as pi-stacking, H-bonds, ionic bonds, water bridges, and other hydrophobic interactions—the stability of these interactions is relatively low, lasting between 5% and 24% of the simulation time. This suggests that, although the participates in a variety of interaction types that contribute to its binding, these interactions are not as persistent as those observed for other compounds.

The contact-occupancy analysis shows the molecular basis of the RMSD/RMSF trends. Low ligand RMSD and decreased pocket RMSF are associated with high-occupancy hydrogen bonds to hinge-region residues (e.g., the strong interactions between 45934388 and Leu241/Glu239 - around 70% and 94% occupancy respectively - and the highly robust Lys188 and Leu241 contacts for CNP0360040). Hydrophobic packing with residues such as Phe238, Val173, Ala186 and Val306 further enhances prolonged binding for these ligands. By contrast, compounds that lose key docking interactions during dynamics (e.g., CNP0426983 losing several initial H-bonds and relying more on transient water bridges or weak π-stacking) or that exhibit transient, low-occupancy contacts (as seen for some interactions of 46220502 (control)) correspond to higher RMSD spikes and less consistent pocket stabilization.

MM/PBSA calculations

To assess the binding affinity of DYRK1A with the identified hit compounds, the MM/PBSA method was employed. This computational approach is commonly used to estimate the free energy of binding between a protein and its ligand by integrating molecular mechanics energies with solvation energies. The MM/PBSA technique helps to break down the contributions of different interactions—such as van der Waals forces, electrostatic interactions, and solvation effects—to the overall binding affinity51,52,53.

In this study, MM/PBSA calculations were performed for the interactions between DYRK1A kinase and both the control ligand, abemaciclib, and the hit compounds. The results, summarized in Table 4, offer important insights into the binding affinities of these compounds. The table presents the calculated binding free energies (ΔGTotal), which reflect the strength of the interactions between the ligands and DYRK1A kinase.

Table 4 MM/PBSA calculations for DYRK1A kinase with hit compounds and control ligand.

The ranking of total free energy (ΔGTotal) for the six compounds highlights distinct stability patterns, where lower ΔGTotal values signify higher thermodynamic favorability. Among them, Compound 46,220,502 (control) demonstrated the highest stability, with a ΔGTotal of − 49.86 ± 5.05 kcal mol⁻¹, underscoring its optimal balance between interaction forces and solvation effects. Close contenders include CNP0360040 and 45,934,388, possessing ΔGTotal values of − 46.67 ± 3.48 and − 42.04 ± 3.54 kcal mol⁻¹, respectively, indicating their relatively stable energetic profiles with minor variances. Meanwhile, CNP0426983 and CNP0309850 exhibited slightly weaker stability, reflected in their ΔGTotal values of − 40.31 ± 4.58 and − 38.17 ± 4.11 kcal mol⁻¹, respectively. At the lower end of the spectrum, CNP0344929 emerged as the least stable compound, with a ΔGTotal of − 33.30 ± 6.21 kcal mol⁻¹, suggesting significant energetic drawbacks that reduce its overall thermodynamic efficiency.

A deeper breakdown of energetic components further elucidates these stability rankings. Compound 46,220,502 benefited from the most favourable van der Waals interactions (ΔEvdW = − 59.23 ± 5.90 kcal mol⁻¹), which significantly contributed to its stability. In contrast, CNP0344929 had the least advantageous van der Waals energy, potentially explaining its lower stability ranking. Electrostatic interactions (ΔEEle) also played a crucial role, with CNP0344929 exhibiting particularly strong electrostatic energy (ΔEEle = − 39.75 ± 15.24 kcal mol⁻¹). However, despite a favorable ΔEEle of − 34.77 ± 8.77 kcal mol⁻¹, CNP0344929 suffered from a high desolvation penalty (41.67 ± 5.80 kcal mol⁻¹), which counteracted any stabilizing effects of its electrostatic contributions.

Solvation energy (ΔGSolv) also proved to be a significant determinant of stability among these compounds. CNP0344929 experienced the most unfavorable solvation energy (ΔGSolv = 41.67 ± 5.80 kcal mol⁻¹), highlighting a major desolvation burden that negatively influenced its overall stability. In contrast, 46,220,502 exhibited the lowest solvation penalties, allowing it to achieve a more favorable energy balance. Additionally, gas-phase energy (ΔGGas) played a pivotal role, with CNP0344929 once again showing the least favorable gas-phase energy (ΔGGas = − 60.43 ± 14.98 kcal mol⁻¹). Meanwhile, CNP0360040 displayed more advantageous gas-phase interactions (− 80.69 ± 7.58 kcal mol⁻¹), enhancing its thermodynamic stability.

Principal component analysis (PCA)

Principal Component Analysis (PCA) is a statistical technique widely used in MD simulations to analyze and visualize essential molecular motions. By identifying the most significant modes of motion, PCA reduces the complexity of the data, allowing for a clearer understanding of the system’s conformational dynamics49,54,55. In MD simulations, PCA extracts principal components (PCs) that capture the largest variance in molecular motion, with PC1 typically representing the most dominant mode of conformational change.

Figure 8 illustrates the contributions of the first three principal components (PC1, PC2, and PC3) to the total conformational variation of the top five hit compounds—45,934,388, CNP0344929, CNP0360040, CNP0309850, and CNP0426983—alongside 46,220,502, used as a control. The percentage of variance explained by the first three PCs for each compound is as follows: 38.9% (45934388), 41.1% (CNP0344929), 49.4% (CNP0360040), 34.2% (CNP0309850), 57.8% (CNP0426983), and 50.8% (46220502). These values indicate the extent to which the primary modes of motion contribute to the overall system dynamics, with higher percentages suggesting a greater concentration of motion within the leading PCs.

To visualize these conformational variations, the top three PCs were plotted in pairs (PC1/PC2, PC1/PC3, and PC2/PC3), providing information about the dynamic behavior of each compound throughout the simulation. The colour scheme in the plots effectively represents the progression of these variations, with blue indicating lower structural fluctuations and red signifying higher conformational changes. The smooth gradient from blue to red through white further illustrates the time evolution of each compound’s movement, highlighting how structural transitions occur over the simulation trajectory. Both 45,934,388 (Fig. 8E) and CNP0344929 (Fig. 8F) exhibited gradual conformational transitions in their PC1/PC2 plots, as indicated by the smooth blue–white–red color shifts. Notably, CNP0344929 displayed a broader distribution of points compared to 45,934,388. It was clustered around PC1 + 15 (blue) at the start of the simulation and gradually transitioned to a different cluster at PC1 − 15 (red), reflecting a continuous and progressive conformational change over time.

In contrast, compounds CNP0360040 and CNP0309850 exhibited more overlapping transitions along the PC1 axis. Their clusters, representing different time points through color variations, maintained similar PC1 values throughout the simulation, indicating minimal structural variation in the conformational space. For CNP0360040 (Fig. 8C), the initial simulation step revealed a more anisotropic distribution, with blue and red points overlapping, though a light blue region was observed at + 20. Meanwhile, CNP0309850 (Fig. 8D) displayed even greater stability, with red and blue dots consistently occupying similar regions, signifying less structural fluctuation.

CNP0426983 (Fig. 8E) demonstrated an even more pronounced conformational transition than both 45,934,388 (Fig. 8E) and CNP0344929 (Fig. 8F), as evident in its PC1/PC2 and PC1/PC3 plots, which shifted from + 30 to − 10. Similarly, 46,220,502 (Fig. 8F) exhibited smooth transitions along both PC1/PC2 and PC1/PC3 axes, spanning from − 10 to + 30. These patterns suggest that both compounds underwent more extensive yet continuous structural rearrangements throughout the simulation.

Dynamic cross-correlation matrix (DCCM)

DCCM is a statistical tool used in MD simulations to assess the interdependence of atomic movements within a system, such as a protein or protein- ligand complex50,54,56,57. It measures how the movements of pairs of atoms are correlated over time, resulting in a matrix where each element represents the degree of correlation between the motions of two atoms. Figure 9 illustrates the DCCM for DYRK1A kinase in complex with various ligands during 200-ns simulations. The map is color-coded, with blue regions indicating positively correlated movements, where the motions of the two atoms are synchronized in the same direction. Pink regions, on the other hand, represent negatively correlated movements, where the atoms move in opposite directions. The diagonal of the matrix represents self-correlations, which are always equal to 1, while the off-diagonal elements show the correlation between different pairs of atoms. White regions, where there is zero correlation, indicate independent motions, meaning there is no direct relationship between the movements of those atoms. This analysis provides valuable insights into how different parts of the protein or protein-ligand complex move about one another, highlighting cooperative and independent motions.

In the 45,934,388-bound complex (Fig. 9A), the DCCM analysis reveals distinct anti-correlated movements between the β1 (Tyr35-Gly44) and β2 (Gly47-Asp54) regions with αC (Lys69-Lys88), as well as between β1, β2, and the hinge or rim regions (β5, β6, and αD). In addition, β2 is positively correlated with αG (Asn241-Gly254) and αL (Pro257-Asp262), indicating synchronized motion between these regions, which may facilitate the communication between functional domains. β3 (Glu59-Ile66) is anti-correlated with the activation loop (Gly191-Gln199), αG, αL, and αL’, showing that these regions undergo independent movements. β7 (Ile169-Leu171) and β8 (Ile179-Ile181) are positively correlated with αC, β5, and αE (Ser134-Ala153), indicating coordinated movement that might be important for maintaining structural integrity during conformational changes. Lastly, the activation loop shows positive correlations with αF (Leu216-Gly233), αG, and αL, suggesting that the loop and surrounding regions undergo coordinated transitions, likely involved in the kinase’s catalytic activity.

The DCCM analysis of the CNP0344929-bound complex (Fig. 9B) reveals a combination of anti-correlated and correlated movements across various regions of the protein, indicating dynamic and flexible behavior. β1 and β2 show anti-correlation with αC and the hinge or rim regions, suggesting independent motions that provide structural flexibility. Similarly, β3 is anti-correlated with the activation loop, αG, αL, and αL’ (Lys266-Phe270), reflecting independent adjustments crucial for functional changes. The anti-correlation between αD and αE with αL and αL’ suggests distinct movements in these regions to maintain protein flexibility, while the activation loop shows a positive correlation with αG and αL, indicating coordinated movements essential for enzymatic function and ligand binding.

Compared to 45,934,388 and CNP0344929 (Fig. 9A and B), CNP0360040 (Fig. 9C) induces more subtle correlations and anti-correlations, suggesting a potentially stronger destabilizing effect on the kinase. This behavior could alter the protein’s conformational flexibility and functional dynamics, possibly leading to a less stable complex. The DCCM for DYRK1A bound to CNP0309850 (Fig. 9D) exhibits the least pronounced correlation and anti-correlation patterns among the complexes, indicating reduced or less coordinated residue-residue communication. This could suggest that the ligand stabilizes certain regions, such as the ATP-binding pocket or activation loop, while leaving other areas of the protein unaffected or decoupled. Alternatively, this ligand may not induce the same level of structural constraints seen with other ligands (Figs. 9A- C), resulting in a more relaxed conformation.

In the CNP0426983-bound complex (Fig. 9E), the DCCM shows pronounced correlated and anti-correlated movements, indicating that the ligand stabilizes the kinase in a highly flexible conformation with maximal dynamic fluctuations. The correlated movements between regions such as αC, αD, and the activation loop are intensified, suggesting greater cooperativity between these regions. The anti-correlated movements between β1, β2, and αC, as well as between β3 and αG, reflect distinct independent motions, with the protein adjusting to accommodate the ligand in a highly dynamic and nonuniform fashion. This behavior is similar to that observed for the control ligand (Fig. 9F), which also exhibits high dynamics and enhanced residue-residue communication, indicating that the ligand induces a more flexible, hinge-like motion. These dynamics likely play a role in enhancing the kinase’s activity or flexibility, allowing it to adapt to different binding states.

Free-energy landscape (FEL)

FELs, derived from MD trajectories, are essential tools for visualizing the relationship between structural coordinates (e.g., RMSD, RG) and their corresponding free energy states54,58,59,60. FELs highlight the most stable conformational states, known as energy minima, and uncover the pathways that connect these states. This information is crucial for assessing the stability and flexibility of protein- ligand complexes, providing insights into binding mechanisms and energetic preferences59,60. Figure 10 showcases the FELs for c DYRK1A kinase bound to six different ligands, with each plot illustrating the stability and conformational diversity of the kinase-ligand complexes. In the plots, the dark blue regions represent the band minima, indicating the most stable conformations with the lowest free energy, while the red regions correspond to higher energy states, which are less favorable and signify fewer stable conformations.

First, the FEL of 45,934,388 reveals two minima (Fig. 10A), with RMSD values of 0.188242 nm and RG values of 2.22322 and 2.22706 nm. These minima are closely spaced, appearing visually as a single extended band minimum, which suggests a stable system with slight structural flexibility. The small differences in RG and RMSD indicate that localized regions of stability are connected by low-energy barriers. This allows the system to transition easily between these states, maintaining overall stability with minimal fluctuations.

Next, the FEL of CNP0344929 (Fig. 10B) reveals four basins, each with RMSD values of 0.187788 nm, 0.196129 nm, and RG values of 2.19577, 2.20414, 2.19298, and 2.19856 nm. The system exhibits more diversity in its basins, which suggests a higher degree of flexibility compared to 45,934,388. While the RMSD and RG values show a slight increase, the basins are still relatively close to one another, indicating that the system can transition smoothly between these local minima.

For CNP0360040, the FEL (Fig. 10C) reveals three minima, with RMSD values of 0.14608 and 0.15287 nm and RG values of 2.21584 and 2.21893 nm. The minima at the lower RMSD and RG values suggest a more stable structure with limited flexibility. However, the presence of a slightly higher RG minimum indicates a potential for increased structural flexibility, while the system maintains overall stability. This dataset reflects a system that is adaptable yet retains its structural integrity, with smooth transitions between these low-energy states.

In CNP0309850 (Fig. 10D), the FEL shows four basins, each with RMSD values of 0.212183 nm and RG values of 2.22029, 2.22354, 2.22680, and 2.23005 nm. The system exhibits more flexibility, with the basins more spread out, indicating a broader range of configurations. This broader appearance extended as one band in the FEL reflects a balance between stability and flexibility, where the system can adopt a variety of conformations while still maintaining a certain level of structural integrity.

The FEL of CNP0426983 (Fig. 10E) reveals two basins with RMSD values of 0.231581 nm and 0.241027 nm, and RG values of 2.2127. The separation between the basins and the higher RMSD and RG values indicate a less stable configuration. The system in this dataset is much more flexible, with larger energy barriers between the basins. This suggests that the system experiences more significant structural transitions and fluctuations, leading to a less stable but more adaptable configuration. The energy required to move between these basins suggests that the system is more prone to structural changes and less rigid than those in the previously discussed ligands.

Finally, the FEL of 46,220,502 (control) (Fig. 10E) reveals two basins with RMSD values of 0.187902 nm and RG values of 2.22147 and 2.22591. Although the basins are relatively close to each other, the presence of two distinct basins indicates that the system experiences more transitions between local minima. The RMSD and RG values suggest a moderately stable system, with a compact structure, but with noticeable flexibility. The energy barriers between the basins are higher than in hit compounds, reflecting more dynamic behavior and slightly reduced stability.

Discussion

Alzheimer’s disease (AD) is a progressive and irreversible brain disorder that primarily affects older adults, leading to a gradual decline in cognitive functions. The disease develops long before symptoms become apparent, often reducing the effectiveness of treatments by the time they are diagnosed. The global burden of Alzheimer’s disease is expected to increase substantially, rising from 26.6 million cases in 2006 to an estimated 106.8 million by 2050. Despite its prevalence, there is limited knowledge about the bioactivity of drug compounds that serve as treatment options for AD. A promising target in AD pathology is the DYRK1A, a protein implicated in the formation of tau oligomers in the brain which leads to AD. This study used a QSAR model to predict the bioactivity of drug compounds targeting DYRK1A, a key enzyme implicated in tau protein oligomerization in Alzheimer’s disease.

The molecular docking results, with docking scores ranging from − 10.194 to − 13.337 kcal/mol (45934388 exhibiting the highest score), suggest a strong projected occupancy of the DYRK1A ATP pocket by many compounds from our screening.udies on structure-based discovery of DYRK1A have indicated analogous docking score ranges (e.g., − 8 to − 13 kcal/mol for flavone and leucettine analogues), characterised by binding poses primarily anchored in the hinge region (Glu239, Leu241) and hydrophobic interactions with Phe170, Phe238, and Val17361,62,63. Our leading compounds replicate this canonical binding mode, indicating they are mechanistically aligned with established DYRK1A inhibitors. Significantly, while docking scores give a valuable ranking, their interpretation requires supplementation with dynamic stability and free-energy rescoring, as demonstrated in our MD and MM/GBSA analyses.

The 3D binding pose analysis indicated that the top five hit compounds occupied the active site of DYRK1A, interacting with key residues. This was further corroborated by the 2D binding profile, which highlighted significant hydrophobic interactions, hydrogen bonding, and pi-stacking interactions. Notably, compound CNP0360040 exhibited an extensive interaction profile, forming hydrogen bonds with Asp287, Lys289, Asp307, Lys167, and Leu241, along with pi-pi stacking interactions with Phe170. These interactions suggest a strong and stable binding conformation, potentially enhancing inhibitory activity. In comparison, the control drug also interacted with key active site residues but exhibited a less diverse interaction profile.

Our in silico ADMET analysis showed generally favourable drug-like characteristics for the hits. All hit compounds adhered to Lipinski’s Rule of Five (Ro5), suggesting favourable oral bioavailability, while the control drug exhibited one violation possibly due to its slightly higher molecular weight. An essential criterion for AD therapeutics is the ability to cross the blood-brain barrier (BBB). All hit compounds except CNP0426983 were predicted to cross the blood–brain barrier (BBB), a crucial quality of CNS-active drugs, aligning with previous reports that DYRK1A inhibitors such as harmine and leucettine derivatives are BBB permeable and produce in vivo neuroprotective effects61,64. Majority of the hit compounds showed moderate LogP values (2–4) with PSA values < ~ 120 Ų, a profile linked with good CNS exposure65.

Furthermore, these compounds were predicted to be potent inhibitors of CYP2D6 but moderate inhibitors and substrates of CYP2C9 and CYP3A4, suggesting a balanced metabolic profile. The potential metabolism via CYP450 enzymes indicates that these compounds can be biotransformed efficiently, which may aid in their systemic clearance and reduce toxicity risks. Absorption studies revealed that all lead compounds demonstrated moderate human intestinal absorption but were ineffective Caco-2 permeants, except for CNP0360040 and the control drug. The limited permeability across Caco-2 cells suggests modifications may be necessary to enhance oral bioavailability. However, their moderate inhibition of P-glycoprotein reduces the likelihood of extensive efflux, which could otherwise limit their bioavailability and therapeutic efficacy.

Cardiac safety is an essential aspect of drug development, and the compounds in this study demonstrated moderate inhibition of hERG channels, indicating a relatively low risk of cardiac arrhythmia. Moreover, the toxicity assessment suggested that the compounds exhibit moderate mutagenicity with minimal potential for hepatotoxicity, making them promising candidates for further investigation. While these ADMET projections are optimistic, they should be treated as prioritising criteria. Experimental PK, microsomal stability, and BBB assays will be necessary to confirm our computational findings, in accordance with recent DYRK1A pipelines that integrated in silico ADMET with rodent PK and efficacy validation28,66.

The QSAR model, Kpls_molprint2D_49, demonstrated high reliability, with an R² of 0.5245 and a Q² of 0.5228, indicating good predictive performance. The prediction of pIC50 values for the lead compounds provided an additional layer of screening, aiding in the prioritization of promising candidates before experimental validation. Interestingly, despite the higher binding affinities observed in docking, the predicted pIC50 values were comparable to that of the control drug, suggesting that potency is influenced by multiple physicochemical factors beyond docking scores alone. The QSAR data and descriptor analysis showed the structural characteristics associated with projected DYRK1A inhibition. Compound 45,934,388, exhibiting the highest pIC₅₀ (6.16), demonstrated a modest LogP (2.8) and a considerable hydrogen bonding capacity (HBD = 4, HBA = 7). This indicates a balance between hydrophobic contacts within the ATP pocket and hinge-directed hydrogen bonding. Conversely, CNP0360040 (pIC₅₀ = 5.75) exhibited a greater LogP (3.5) with HBD (4) and HBA (8), with a PSA of 126 Ų, indicating that excessive polarity and surface area could diminish permeability and total anticipated potency. CNP0309850 and CNP0426983 exhibited moderate activity (pIC₅₀ ≈ 6.0), alongside elevated PSA values (152 and 104 Ų, respectively), which may restrict bioavailability despite advantageous hydrogen bonding potential. Notably, the reference drug, 46,220,502, exhibited a high LogP (3.8), a low HBD (1), and a moderate HBA (8), resulting in the lowest PSA (75 Ų). This optimised profile elucidates its enhanced projected activity (pIC₅₀ = 6.32) and aligns with prior studies indicating that effective kinase inhibitors typically achieve a balance between hydrophobicity and restricted polar surface area to improve permeability65. This finding also support Hu and Bajorath and Kothiwale et al. who reported that higher lipophilicity and lower polar surface area are associated with higher pIC5031,67. This reinforces the importance of integrating QSAR modeling into drug discovery workflows to gain a more comprehensive assessment of bioactivity. The results indicate that the identified compounds have substantial inhibitory potential against DYRK1A, warranting further optimization and experimental validation through in vitro and in vivo studies. In combination with docking and ADMET profiling, the QSAR data provides an additional validation step and ensures that compound prioritisation is based not only on binding affinity but also on predicted potency and drug-likeness.

The molecular dynamics simulations of DYRK1A kinase in complex with the six different ligands revealed distinct stability and interaction patterns. The root-mean-square deviation (RMSD) analysis indicated that the protein generally maintained a stable conformation across simulations, with RMSD values fluctuating within a consistent range. However, ligand RMSD trends varied significantly, reflecting differences in binding stability and conformational adjustments. Some ligands, such as CNP0344929 and CNP0426983, exhibited higher RMSD fluctuations early in the simulation before stabilizing, suggesting initial instability followed by a more favorable binding conformation. Others, like 45,934,388, maintained a relatively stable interaction throughout, indicating strong binding affinity and minimal conformational drift. Similar patterns of RMSD stabilization following equilibration have been found in recent DYRK1A MD experiments, where prolonged low RMSD linked with biochemical potency28,68.

Residue-specific mobility analysis (RMSF) provided additional insights into the protein’s flexibility, particularly in regions outside the binding pocket. The kinase’s active site showed reduced flexibility upon ligand binding, emphasizing the stabilizing effect of these compounds. However, loop regions and terminal regions remained dynamic, which may be functionally important for the enzyme’s activity and regulation. Protein-ligand contact analysis highlighted key interactions, including hydrogen bonding, hydrophobic contacts, and water-mediated bridges, which contributed to binding stability. While some ligands retained crucial docking interactions, others formed new interactions during the simulation, suggesting ligand-induced conformational rearrangements. Contact occupancy analysis indicated that persistent hinge H-bonds and hydrophobic packing (Leu241, Glu239, Phe238, Val173) separated stable complexes from unstable ones. These sustained interactions were reflected in favorable MM/PBSA binding energies (e.g., − 49.86 kcal·mol⁻¹ for abemaciclib, − 46.67 kcal·mol⁻¹ for CNP0360040). This congruence between structural dynamics and energetic profiles emphasises the robustness of our prioritization pipeline, in line with other in silico DYRK1A inhibitor investigations52,66.

The molecular dynamics simulations give significant insight into the dynamic stability of DYRK1A–ligand complexes beyond the original docking predictions. The most stable ligands (CNP0360040 and 45934388) displayed sustained hinge-region hydrogen interactions with Glu239 and Leu241, combined with hydrophobic contacts involving Phe238 and Val173. These interactions parallel previously reported binding determinants of DYRK1A inhibitors, where hinge anchoring has been emphasised as a necessity for potency and selectivity63. In contrast, CNP0344929 displayed substantial ligand RMSD excursions and weaker pocket stabilization, demonstrating its reliance on transitory water-mediated interactions rather than direct hinge engagement. The RMSF analysis validated these data, as complexes with stable ligands showed reduced flexibility (< 3 Å) in hinge and ATP-binding pocket residues, while less stable ligands left loop regions more mobile. Such ligand-induced stability of the active site is consistent with earlier kinase MD studies where rigidification of hinge-adjacent residues correlates with sustained inhibitory activity62. Similarly, interaction persistence study indicated that high-occupancy hydrogen bonds and stable hydrophobic packing are major predictors of ligand reliability, in keeping with prior in silico assessments of kinase inhibitors [45,69]. The MM/PBSA calculations further quantified binding affinities, confirming the energetic favorability of the ligand interactions. The free-energy landscape (FEL) analysis supported these findings by mapping the most stable conformations for each complex, with lower-energy states indicating favorable binding poses.

This study is limited by its complete dependence on in silico techniques. QSAR predictions are valid within the model’s application domain and require experimental validation. Docking and MD provide valuable mechanistic concepts but cannot fully replace biochemical IC₅₀ experiments, selectivity profiling, or in vivo PK/efficacy studies. Future study should thus focus on in vitro DYRK1A enzymatic assays for the top-ranked hits.

Conclusion

This study utilized a computational approach to identify and evaluate potential DYRK1A inhibitors for Alzheimer’s disease treatment. Through molecular docking, MD simulation, ADMET profiling, and QSAR modeling, we identified five promising compounds with strong binding affinities, favorable pharmacokinetic properties, and significant biological activity predictions. These compounds exhibit potential as novel DYRK1A inhibitors, which could contribute to the development of more effective therapeutic strategies for AD. However, it is essential to acknowledge the limitations of computational approaches, as experimental validation remains necessary to confirm the efficacy and safety of these compounds. Future research should focus on in vitro and in vivo studies to assess their therapeutic potential and optimize their pharmacological profiles. Furthermore, structure-activity relationship studies and pharmacophore modeling could help refine these compounds for enhanced potency and selectivity. Ultimately, integrating computational and experimental efforts will be crucial in advancing these inhibitors towards clinical application, offering hope for improved treatment options for Alzheimer’s disease.

Fig. 1
Fig. 1
Full size image

Superposition of the control drug, abemaciclib (green sticks), with the five top hit compounds (colored sticks) in the active site of DYRK1A. The control drug and the five hit compounds occupied similar region of the active site.

Fig. 2
Fig. 2Fig. 2
Full size image

2D representation of molecular interaction of side chains of the amino acid residues in the active site of DYRK1A and the atoms of the hit compounds. (a) 10-Formyl-3,9-dihydroxy-4-(hydroxymethyl)−1,7-dimethyl-6-methylidenebenzo[b]1,4benzodioxepine-2-carboxylic acid (45934388), (b) 7-[4-hydroxy-3-(hydroxymethoxy)phenyl]−1-(4-hydroxy-3-methoxyphenyl)hept-4-en-3-one (CNP0344929), (c) Aliarin (CNP0360040), (d) NSC695598 (CNP0309850), (e) Wailupemycin L (CNP0426983), and (f) Abemaciclib (46220502).

Fig. 3
Fig. 3
Full size image

Scatter plot showing the performance of the best QSAR model using the AutoQSAR module of Schrodinger’s Maestro.

Fig. 4
Fig. 4Fig. 4
Full size image

Protein (teal blue) and ligand (magenta) RSMDs of the interactions between DYRK1A kinase and (A) 45,934,388, (B) CNP0344929, (C) CNP0360040, (D) CNP0309850, (E) CNP0426983, and (F) 46,220,502 (control) throughout a 200-ns simulation.

Fig. 5
Fig. 5Fig. 5
Full size image

Protein RMSFs of the interactions between DYRK1A kinase and (A) 45,934,388, (B) CNP0344929, (C) CNP0360040, (D) CNP0309850, (E) CNP0426983, and (F) 46,220,502 (control) over the course of a 200-ns simulation. The vertical green lines indicate residues directly involved in binding interactions.

Fig. 6
Fig. 6Fig. 6
Full size image

Protein-ligand contact histograms illustrating the interactions between DYRK1A kinase and (A) 45,934,388, (B) CNP0344929, (C) CNP0360040, (D) CNP0309850, (E) CNP0426983, and (F) 46,220,502 (control) throughout a 200-ns simulation.

Fig. 7
Fig. 7Fig. 7Fig. 7
Full size image

2D protein-ligand contact summary of the interactions between DYRK1A kinase and (A) 45,934,388, (B) CNP0344929, (C) CNP0360040, (D) CNP0309850, (E) CNP0426983, and (F) 46,220,502 (control) throughout a 200-ns simulation.

Fig. 8
Fig. 8
Full size image

PCA depicting the conformational dynamics of DYRK1A kinase during 200-ns simulations with various ligands. The analysis presents the top three principal components (PC1, PC2, PC3), which capture the majority of motion and structural variation in the kinase-ligand complexes. Panels A- F represent the trajectories of the following compounds: (A) 45,934,388, (B) CNP0344929, (C) CNP0360040, (D) CNP0309850, (E) CNP0426983, and (F) 46,220,502 (control).

Fig. 9
Fig. 9
Full size image

The DCCM analysis of DYRK1A kinase reveals the correlated movements of different regions of the kinase during 200-ns simulations with various ligands: (A) 45,934,388, (B) CNP0344929, (C) CNP0360040, (D) CNP0309850, (E) CNP0426983, and (F) 46,220,502 (control).

Fig. 10
Fig. 10
Full size image

FEL contour plots illustrating the stability and conformational dynamics induced by DYRK1A kinase interactions with various ligands: (A) 45,934,388, (B) CNP0344929, (C) CNP0360040, (D) CNP0309850, (E) CNP0426983, and (F) 46,220,502 (control).