Introduction

Acute myeloid leukemia (AML) is a devastating hematological malignancy characterized by the rapid proliferation of immature myeloid cells, disrupting normal blood cell production and function1,2. The incidence of AML increases with age, with a median age of ≥ 65 years at the time of diagnosis3,4. Compared with their younger counterparts, older patients have AML that is more frequently associated with chemotherapeutic resistance, unfavorable cytogenetics, increased frequency of somatic mutations, and is preceded by myelodysplastic syndromes, making therapeutic decisions difficult5. Many patients have their treatment chosen more based on chronological age rather than the inherent disease biology and overall fitness of patients6.

Currently, with the advancement of new diagnosis and treatment technologies, the survival rate of AML patients has improved significantly; however, the long-term survival rate of patients remains poor. For patients < 60 years old, according to the related previous studies, the 5-year overall survival (OS) rate is less than 40%; for the majority of patients with AML (aged over 60 years old), the 5-year OS rate is only 10–20%7,8. Meanwhile, AML patients are mainly accompanied by a poor prognosis. Even though complete remission (CR) with intensive induction chemotherapy is possible, over half of young adult patients and about 90% of elderly patients still succumb to the disease9. Therefore, finding effective solutions could diminish mortality by early diagnosis and prognosis of patients with AML.

Eclipta prostrata is an annual herb that is native to China, Japan, and India, and is used as a traditional Chinese herbal medicine to treat many diseases, including cancer10,11. Interestingly, beyond its traditional uses, growing evidence suggests that the phytochemicals derived from E. prostrata may harbor unexplored mechanisms of action against cancer, offering novel opportunities for therapeutic development12. E. prostrata has diverse pharmacological properties, including anti-inflammatory, antioxidant, and hepatoprotective effects, and has recently emerged as a promising source of bioactive compounds with potential anti-cancer properties13,14.

While the anti-cancer potential of E. prostrata has been recognized, the specific mechanisms by which its constituent compounds exert their effects against AML remain largely unexplored. Elucidating the mechanisms of action of E. prostrata against AML is of great importance, as it can lead to the identification of novel therapeutic interventions and biomarkers.

Therefore, the present structural and systems biology study aims to elucidate the mechanism of action of E. prostrata derived compounds against AML using comprehensive in-silico analysis, including network pharmacology, Quantum Chemistry, Gene Ontology, KEGG Pathway, Protein–Protein Interaction Networks, Molecular Docking, Molecular Dynamics Simulation, Post-dynamics simulation MMGBSA, Principle Component Analysis (PCA), Dynamic Cross-Correlation Matrix (DCCM), 3D-QSAR Modeling, Gene-Regulatory Networks, Gene Expression, Survival Analysis, and Cancer hallmarks analysis to identify novel therapeutic targets, novel therapeutics, the specific regulatory elements that may interfere with therapeutic benefits, and potential biomarkers for early and better diagnosis and prognosis of AML. This knowledge can contribute to the development of more effective and personalized treatment approaches, potentially improving the clinical outcomes and prognosis for patients with this devastating cancer. The graphical abstract of this study is presented in Fig. 1.

Fig. 1
figure 1

The workflow of this study includes all utilized approaches.

Methods and materials

Selection of compounds

E. prostrata is a medicinal plant that has reported therapeutic potential in the treatment of AML, the compounds of which have been obtained from the IMPPAT (Indian Medicinal Plants, Phytochemistry and Therapeutics) 2.0 database (https://cb.imsc.res.in/imppat/). IMPPAT 2.0 is a comprehensive database that contains information on medicinal plants and their phytochemical constituents, providing a valuable resource for the identification of candidate compounds15. To further enrich the compound selection process, the PubChem database (https://pubchem.ncbi.nlm.nih.gov/) was used to obtain taxonomic information on E. prostrata, and a comprehensive literature search was conducted using the PubMed database (https://pubmed.ncbi.nlm.nih.gov/) to gather information on the traditional uses, therapeutic potential, and reported bioactivities of compounds found in E. prostrata16,17. The identified compounds of interest from the IMPPAT 2.0, PubChem, and PubMed Literature searches were then collected from the PubChem database in SMILE and 3D structured data file (sdf) format. By integrating these three databases, we have compiled a list of compounds from this plant.

Screening of compounds

Drug-likeness, physicochemical, pharmacokinetics, and toxicity prediction

Before exploring the complex network of compounds, the determination of drug-likeness, physicochemical, pharmacokinetics, and toxicity properties is required to avoid the identification of any targets from non-druglike, inactive, and toxic compounds. Hence, the ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) properties of the candidate compounds were evaluated using a combination of in-silico tools. Specifically, we employed the SwissADME platform (https://www.swissadme.ch/) to assess the druglikeness, Physicochemical, and pharmacokinetic characteristics of the compounds, including their solubility, permeability, and metabolic stability18. SwissADME is a comprehensive web-based tool that utilizes the physicochemical properties of compounds to predict key ADMET parameters, allowing us to identify compounds with favorable drug-like characteristics. Additionally, we utilized the pkCSM (Predicting Small-Molecule Pharmacokinetic and Toxicity Properties) platform (https://biosig.lab.uq.edu.au/pkcsm/) to evaluate the potential toxicity of the compounds, including endpoints such as hepatotoxicity, cardiotoxicity, and mutagenicity19.

Calculation of quantum chemical properties

Apart from the assessment of drug-likeness, physicochemical, pharmacokinetics, and toxicity profiling, evaluating the thermodynamic, electronic, reactivity, stability properties, and potential off-target possibilities of the compounds also provides valuable insights into the activity and inactivity of compounds. Higher reactive compounds tend to produce off-target effects and breakdown by body enzymes, and conversely, higher stable compounds are dormant to produce an effect20. On the other hand, Thermodynamic parameters, such as enthalpy and entropy, indicate spontaneous formation and stability, while electronic properties, including chemical hardness, determine reactivity21,22. Harder molecules denote strong stability and less reactivity, while softer molecules exhibit higher reactivity23. Therefore, compounds with optimal reactivity, stability, thermodynamics, and electronic properties are desired to avoid wasting time on unfavorable compounds. During these quantum chemical analysis processes, the thermodynamic properties, electronic properties, frontier molecular orbital analysis, frequency analysis, and density of states analysis were performed utilizing Density Functional Theory (DFT) in Gaussian software24. The B3LYP algorithm and the computationally expensive 6-311G basis were set to maintain higher accuracy during the DFT calculations. The equations applied to calculate DFT are represented below:

$${\text{Gap }(\Delta E) = (E}_{\text{LUMO}}- {E}_{\text{HOMO}})$$
(1)
$$\text{Hardness},\upeta =\frac{{(E}_{\text{LUMO}}- {E}_{\text{HOMO}}) }{2}$$
(2)
$$\text{Softness},\text{ S}= \frac{1}{2\upeta }$$
(3)
$$\text{Electronegativity},\upchi = - \frac{{(E}_{\text{LUMO}}+ {E}_{\text{HOMO}}) }{2}$$
(4)
$$\text{Mean Energy},\upmu =\frac{{(E}_{\text{LUMO}}+ {E}_{\text{HOMO}}) }{2}$$
(5)
$$\text{Electrophilicity},\upomega = \frac{{\upmu }^{2}}{2\upeta }$$
(6)

The results of thermodynamics studies, electronic properties, and frontier molecular orbital analysis were visualized using GaussView 06, and the frequency and density of states of compounds were visualized using GaussSum Software. By integrating ADMET, thermodynamics, and electronic properties of compounds, we have prioritized the most favorable compounds for further studies.

Compound-derived target prediction

After the initial screening of compounds, potential targets of the candidate compounds were identified by employing the SwissTargetPrediction platform (https://www.swisstargetprediction.ch/)25. This computational tool utilizes a combination of chemical similarity and known target-ligand interaction data to predict target proteins for specific compounds using machine learning algorithms. The rationale for using SwissTargetPrediction lies in its ability to leverage extensive databases that integrate chemical and biological information, enhancing the accuracy of target identification26. To further strengthen the target selection process, we have employed the BindingBD database (https://www.bindingdb.org/), which contains experimentally validated targets and compounds27. By combining predicted and experimentally validated targets, a list of the most promising targets of E. prostrata has been compiled.

Acute myeloid leukaemia relevant target prediction

Targets relevant to AML were identified from the MalaCards (https://www.malacards.org/) and DisGeNet databases (https://disgenet.com/)28,29. MalaCards (Integrated Database of Human Maladies) contains an integrated database of human maladies and their annotations, providing comprehensive information on disease-associated genes and targets. DisGeNet (Disease Gene Network), on the other hand, is a platform that curates data on gene-disease associations, including those relevant to various cancers and hematological disorders30,31. By querying these two complementary resources, we were able to cross-reference the predicted and experimentally validated targets of AML.

Identification of common targets

To identify common targets among the compounds derived from E. prostrata and disease-specific targets of AML, we utilized the Interactivenn platform (https://www.interactivenn.net/), a web-based tool for the visualization and analysis of the Venn diagram32. Interactivenn was selected due to its ability to facilitate the intuitive and interactive exploration of overlapping targets, allowing us to pinpoint the shared molecular targets across the compound set. This process involved uploading the lists of predicted targets for each compound, generated through the previous SwissTargetPrediction and BindingBD analysis, and AML-related targets from MalaCards and DisGeNet databases33. The Interactivenn platform then generated a Venn diagram to visualize the common and unique targets, enabling us to identify the intersecting set of targets from compounds and AML.

GO and pathways enrichment analysis

The biological functions and signaling pathways associated with the identified common target proteins were comprehensively analyzed utilizing a two-pronged approach. First, the DAVID (Database for Annotation, Visualization and Integrated Discovery) database (https://david.ncifcrf.gov/) was utilized to perform Gene Ontology (GO) analysis, covering the biological process (BP), molecular function (MF), and cellular component (CC) domains34. The DAVID tool provides a comprehensive set of functional annotation tools to understand the biological meaning behind large lists of genes35. The GO plot was visualized using SRplot (https://www.bioinformatics.com.cn/en), which integrated more than a hundred commonly used data visualization and graphing functions36. Second, the ShinyGO v0.8 (http://bioinformatics.sdstate.edu/go/) platform was used to assess the enrichment of KEGG pathways related to the target proteins37,38. ShinyGO is a user-friendly web server that integrates various pathway databases and allows for the visualization and interpretation of pathway-level associations37,38. The combination of GO and pathway enrichment analysis provided a deeper understanding of the molecular mechanisms and cellular processes underlying AML.

Protein–protein interaction network analysis

Protein–protein analysis was performed to understand the specific interactions among proteins in a large complex protein–protein interaction network39. Protein–protein interaction network was analyzed using the STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) database (https://string-db.org/), which is a comprehensive, high-quality resource that provides information on known and predicted protein–protein interactions, including direct (physical) and indirect (functional) associations40. The STRING database integrates data from numerous sources, such as experimental repositories, computational prediction methods, and public text collections, to construct a global network of protein–protein interactions41. In the current study, the STRING database was accessed to identify the interactions among the identified target proteins associated with AML.

Identification of hub proteins and protein clusters

The identification of hub targets provides insights into the most significantly associated proteins, and the identification of protein clusters means the identification of functional modules from the whole network that perform specific functions together42. By integrating the hub target identification and cluster analysis, our study aimed to uncover the most critical proteins and protein complexes that play pivotal roles in the pathogenesis of AML. The key hub targets and clusters within the protein–protein interaction network were identified using two complementary computational plugins of Cytoscape43. First, the Cytohubba (Cytoscape Plugin for Identification of Hub Proteins), which is a powerful bioinformatics software platform for visualizing complex networks and identifying critical nodes within these networks, was used to determine the hub targets based on various topological features, such as degree, betweenness centrality, closeness centrality, maximum neighborhood component (MCC), and maximal clique centrality (MNC). The Cytohubba plugin provides a comprehensive set of algorithms to identify the most influential or central nodes, which are considered potential hub targets44. Second, the MCODE (Molecular Complex Detection) plugin of Cytoscape was used to detect densely connected clusters within the protein–protein interaction network. MCODE is a clustering algorithm that can identify closely related protein complexes or functional modules, which may represent important biological processes or signaling pathways45. By evaluating gene ontology, pathways, hub targets, and protein clusters, we were able to prioritize the most significant targets for further drug-target association studies. All the networks of this study have been constructed and visualized using Cytoscape43.

Drug-target association studies

Identification and validation of drug targets

After identification of the significantly associated target proteins of AML, determination of and validation of potential drug targets for the treatment of AML are also crucial. Without blindly assuming any targets as drug or therapeutic targets, the Human Protein Atlas (HPA) (https://www.proteinatlas.org/) and Therapeutic Target Database (TTD) (https://idrblab.net/ttd/) were applied to gain insights into the sophisticated information on druggability and therapeutic targeting opportunities46,47. The HPA database was queried to assess the expression and druggability of target proteins associated with AML. In contrast, the TTD was subsequently utilized to validate the identified targets, evaluating their status as known therapeutic targets, approved or investigational drug targets, and targets that are currently in the clinical stage of development. By leveraging these two comprehensive databases, we aimed to identify and prioritize both the most promising drug targets that have been previously explored, targets that are currently under investigation for the treatment of AML, and novel targets that have not been explored for the treatment of AML yet. Therefore, this process allows us to select the most viable candidates for further drug discovery applications.

Molecular docking verification of drug targets

Molecular docking provides insight into the most suitable binding poses and binding affinities between targets and compounds. The ligands were collected from the PubChem database16, and the structures of proteins were retrieved from the Protein Data Bank database48, which focuses primarily on providing high-quality experimentally determined structures of proteins. The target-specific and FDA-approved anticancer drugs were chosen as controls from the DrugBank database49, which consists of a collection of approved, investigational, and experimental drugs. In this study, the AutoDock VINA tool from PyRx 0.8 software was utilized to perform molecular docking and calculate the binding energies between targets and compounds50. AutoDock VINA is a popular and fast virtual screening tool that helps in screening a large list of potential compounds. Apart from VINA docking, which mainly focuses on the speed of virtual screening, the most relevant and promising targets were further re-validated with GLIDE docking to elucidate and maintain the accuracy of docking51. Therefore, two consecutive popular docking tools were applied to evaluate the docking score. For all proteins, the experimentally determined co-crystallized inhibitory ligand binding sites were considered as the binding sites for docking studies.

Molecular dynamics simulation

Molecular docking calculates only binding poses, and therefore, to identify the interaction properties between targets and compounds in dynamic situations, molecular dynamics (MD) simulation studies were applied. We have utilized the most promising and high-affinity drugs that are already available as controls. The MD simulations were conducted using the DESMOND package, a widely used tool of Schrodinger’s suite for conducting MD simulation analyses52. An orthorhombic simulation box was employed to simulate realistic environmental conditions, and the System Builder tool was utilized to facilitate the system design. The Desmond package was utilized to run a 200 ns Molecular Dynamics simulation53. To keep the volumes of the systems constant, the orthorhombic periodic boundary box has been designated by adjusting the distance of (10 × 10 × 10 Å3) for each complex SPC water model. The OPLS_2005 force field was used to carry out the simulations, with counterions added to the models when necessary to neutralize them. A 0.15 M NaCl salt solution was included to simulate physiological conditions, and the NPT ensemble was used to maintain the temperature (300 K), pressure (1 atm), and moles throughout the simulation. The models were relaxed using pre-simulation methods before the production run. The MD simulation analysis included the assessment of root-mean-square deviation (RMSD), root-mean-square fluctuation (RMSF), radius of gyration (RoG), solvent-accessible surface area (SASA), polar surface area (PSA), molecular surface area (MolSA), and specific protein–ligand contacts, such as hydrogen bonds, hydrophobic interactions, ionic interactions, and water-bridges. These comprehensive MD simulation studies provided insights into the structural stability, flexibility, and dynamic interactions of the target proteins with the selected drug candidates. Furthermore, the Bio3D package of R was used to perform Principal Component Analysis (PCA) and the dynamic cross-correlation matrix (DCCM) from the trajectory of the MD simulation54. PCA and DCCM are calculated using a script that is built in R. PCA and DCCM are essential to figuring out the conformational dynamics and correlated motions within protein–ligand complexes that provide a more detailed understanding of dynamic motions between the elements of protein–ligand complexes.

Binding free energy calculation using MM-GBSA

After analysis of different motion and dynamic properties, the binding free energy between targets and compounds was calculated to quantitatively assess the interaction strength between the target proteins and candidate compounds. To quantify the binding affinity, post-dynamic simulation MM-GBSA (Molecular Mechanics Generalized Born Surface Area) calculations were performed using the Prime module within the Schrödinger software suite53. The MM-GBSA utilizes the OPLS_2005 force field for accurate energy calculations and the Generalized Born model for implicit solvation to effectively represent the solvent environment without explicit solvent molecules. The Prime module of the Schrödinger suite was used for its robust implementation of MM-GBSA calculations and its compatibility with the previously generated MD trajectories. Binding free energies were calculated using snapshots extracted from the equilibrated phase of the MD simulations, which generated numerical values for binding free energies (ΔG bind), providing a quantitative assessment of the strength and stability of each protein–ligand complex, such as covalent interactions, van der Waals interactions, electrostatic interactions, Hydrogen bonding, solvation effects, etc.

Calculation of structure–activity relationship between compounds and targets

Though strong interactions between targets and compounds indicate positive signs in the drug discovery process, it doesn’t provide any insights into the potency and biological effects of a compound, specifically the dose–response relationship between compound and target. A compound might have higher binding affinity and binding free energy with targets, but that does not indicate the compound is active. Even a compound with higher binding affinity can be inactive, which means it cannot produce a biological response at a safer concentration55,56. An active drug is an ideal drug that can elicit a biological response with a lower drug concentration. Higher doses of drugs are often toxic and tend to produce adverse side effects57. Therefore, calculating the optimal dose that may produce an effective response by inhibiting or activating a target is crucial after identifying optimal poses and interactions between compounds and ligands. Before estimating the dose–response relationship, bioassay data that contain experimentally validated dose–response relationships between a range of active and inactive compounds with targets have been retrieved from the BindingDB database (https://www.bindingdb.org/rwd/bind/index.jsp)27. BindingDB is a database that contains experimentally validated bioassay data from in vitro studies of compounds against specific targets in the form of KI (Inhibition Constant), IC50 (Half-maximal Inhibitory Concentration), EC50 (Half-maximal Effector Concentration), and others. In this study, IC50 data have been considered because they provide insight into the inhibitory ability and response of compounds. IC50 stands for half maximal inhibitory concentration, which is suitable to establish a model of dose–response relationship. The dose–response relationship model from the 3D structure of compounds has been established using a Gaussian force field-based 3D quantitative structure–activity relationship study with Schrodinger software, where 70% of compounds have been utilized for the training dataset and 30% was for the validation dataset.

Gene regulatory networks analysis

The effects of targets and compounds can be significantly impacted by gene-regulatory elements such as microRNAs and Transcription Factors. These events occur after post-transcriptional modification periods and regulate the proper folding, maintenance, and activity of proteins. Therefore, a gene-regulatory network has been constructed to determine the key elements that might interfere with and impact the outcome of the target-compound interactions. To elucidate gene-regulatory networks (GRNs), the complex interactions of microRNAs (miRNAs) and transcription factors (TFs) with targets of AML were elucidated by employing NetworkAnalyst 3.0 (https://www.networkanalyst.ca/)58. Specifically, we utilized miRTarBase v9.0, a comprehensive database of miRNA-target interactions, and TTRUST, a database of TF-target interactions, to construct and analyze the GRNs59. We chose NetworkAnalyst 3.0 for this purpose because of its capability to integrate and analyze large-scale biological networks and miRTarBase v9.0 and TTRUST for their reliability and comprehensiveness in providing miRNA-target and TF-target interactions, respectively. These tools helped to identify key regulatory elements and their interactions that may interfere with drug efficacy and target engagement.

Potential biomarker identifications

Biomarkers indicate the presence or absence of a condition or disease, and hence, identifying biomarkers of AML is also crucial. Potential biomarkers for the diagnosis and prognosis of AML were identified using the Gene Expression Profiling Interactive Analysis (GEPIA 2) database (http://gepia2.cancer-pku.cn/)60. GEPIA 2 is a web-based platform that provides comprehensive and interactive analysis of gene expression data derived from the TCGA (The Cancer Genome Atlas) and GTEx (Genotype-Tissue Expression) projects, encompassing a wide range of cancer types and normal tissue samples. In the current study, the GEPIA 2 tool was queried to assess the expression patterns of the identified target proteins associated with AML. Specifically, the tool was used to evaluate the differential expression of the target proteins between AML samples and normal control samples, as well as to investigate their association with patient survival outcomes. This analysis aimed to identify potential diagnostic and prognostic biomarkers that could be further validated and incorporated into clinical practice to improve the management of AML.

Biomarker validation and identification of novel biomarkers

A three-pronged approach was employed to validate the identified biomarkers and explore the potential for novel biomarker discovery. First, the Therapeutic Target Database (TTD) (https://db.idrblab.net/ttd/) was utilized to assess the current status and development stage of the biomarkers. The TTD is a comprehensive resource that provides information on known and explored therapeutic targets and known biomarkers61. By querying the TTD, the study aimed to validate the identified biomarkers and determine which of them have been previously explored or are currently under investigation for the diagnosis and prognosis of AML. Second, the study also focused on identifying novel biomarkers that have not yet been considered as potential biomarkers for AML. This was achieved by analyzing the target proteins that were not previously reported as biomarkers in the TTD or other relevant databases. Finally, we have further validated the biomarkers by utilizing literature mining from PubMed. Therefore, this approach not only validated the identified biomarkers but also uncovered novel abilities that could serve as potential diagnostic, prognostic, or monitoring biomarkers for improved management of AML.

Determination of cancer hallmarks

After gaining insights into the significant molecular events and molecular elements responsible for AML development and indication, a cancer hallmark study was performed using those key elements by employing the Cancer Hallmark database (https://cancerhallmarks.com/)62. The cancer hallmarks concept generates an enrichment plot for understanding the fundamental organizing principles for the prevention, diagnosis, and treatment of cancer. In the case of AML, we prioritized two sets of proteins for this study. One set contains all the potential targets initially identified from hub proteins, protein clusters, pathways, and significantly expressed proteins in AML. Another set contains only the common and most significantly expressed proteins of AML after careful evaluation of molecular docking, molecular dynamics simulation, gene expression, survival analysis, and comparative studies to summarize fundamental principles behind AML progression, which can further help in decision-making on both biomarker necessity and therapeutic interventions.

Statistical analysis

For statistical analysis, the p-value and FDR (False Discovery Rate) were applied to calculate the significance of Gene Ontology and KEGG pathway enrichments, respectively. PCA and DCCM were utilized for dimensionality reduction and correlation dynamics analysis of complex systems, respectively. The log-rank test was utilized to estimate the logrank p-value in Kaplan–Meier plots to compare survival curves. On the other hand, Spearman’s correlation was employed for gene expression analysis. Both the logrank test and Spearman’s correlation were utilized from the GEPIA2 platform60. For cancer hallmark analysis, the adjusted p-value has been considered. A threshold p-value of 0.05 was considered statistically significant in all cases.

Results

Selection of anti-cancer compounds

We reported the identification and characterization of 12 promising anticancer compounds through a multi-step screening and evaluation process. At first, we identified 451 potential anticancer compounds from the IMPPAT, PubChem, and PubMed databases through a systematic literature mining approach. Following the removal of duplicates, 187 unique compounds remained for subsequent analysis. A stringent ADMET filter was applied, incorporating drug-likeness, gastrointestinal absorption, water and lipid solubility, bioavailability, and hepatotoxicity criteria to identify 16 compounds with optimal pharmacokinetic and toxicity profiles. In Tables 1 and 2, the observed ADMET properties of 12 anti-cancer compounds from E. prostrata were reported. All compounds showed high GI absorption, no drug-likeness violations, and no hepatotoxicity. Water solubility varied between moderately soluble to soluble, while bioavailability scores ranged from 0.55 to 0.85, suggesting favorable ranges for bioavailability.

Table 1 12 compounds from E. prostrata with known anti-cancer activities.
Table 2 ADMET analysis of anti-cancer compounds from E. prostrata.

Table 3 details the electronic energy (EE), with zero-point, thermal, and thermal enthalpy and free energy corrections, along with optimization energy and dipole moment for 12 compounds (identified by PubChem CID). Energies are reported in Hartrees and dipole moments in Debyes. Electronic energy (EE) and its associated corrections (zero-point, thermal, thermal enthalpy, and thermal free energy), optimization energy, and dipole moment for twelve compounds, all expressed consistently across all compounds, including the control, suggesting a similarity in their fundamental electronic structures and behavior.

Table 3 Different energies with their corrections, optimization energy, and dipole moment.

The calculated physicochemical descriptors, hardness (ƞ), softness (S), electronegativity (χ), mean energy (μ), and electrophilicity index (ω) derived from HOMO and LUMO energies for 12 molecules have been presented in Table 4. The range of values for each descriptor across the compounds, including the control, is relatively narrow. Frontiers Molecular Orbital Analysis showed gaps between HOMO (Highest Occupied Molecular Orbitals) and LUMO (Lowest Unoccupied Molecular Orbitals) states of compounds between 3.17 eV and 4.91 eV for all compounds, including Control Drug where no large gaps have been observed and therefore, suggesting a similar reactivity and stability pattern of compounds (Fig. 2 and Supplementary Fig. S2). Higher HOMO and LUMO gaps mean higher stability, and lower HOMO and LUMO gaps mean higher reactivity. Notably, electrostatic potentials, vibrational frequencies, and frontier molecular orbital gaps of these compounds were closely similar to the control anti-leukemic drug.

Table 4 Physicochemical descriptors of the molecules. Hardness (ƞ), Softness (S), Electronegativity (χ), Mean energy (μ), and Electrophilicity index(ω).
Fig. 2
figure 2

Frontiers Molecular Orbital Analysis. (A) Acacetin, (B) Apigenin, (C) Coniferylaldehyde, (D) Coumestan, (E) Diosmetin, (F) Kaempferide, (G) Kaempferol, (H) Psoralen, (I) Skullcapflavone II, (J) Syringic acid, (K) Tricetin, (L) Vanillic acid, and (M) Selvita (Control Anti-leukemic Drug). The HOMO and LUMO gap range from 3.17 eV to 4.91 eV.

ESP (Electron Surface Potential) analysis is shown in Fig. 3, where the ESP of most of the compounds was between -9.752 a.u. to + 9.752 a.u. indicating compounds have a wide range of electrostatic interaction potentials except (D) Coumestan (-0.103 a.u. to + 0.103 a.u.). Infrared (IR) spectrum analysis measured the absorption of infrared light by each compound. A large vibrational frequency or light absorption was observed for all of the compounds at ~ 1700 cm-1, indicating the presence of any functional group at the same frequency (~ 1700 cm-1) for all compounds. Supplementary Fig. S1 represents the vibrational frequency of the 12 most promising compounds.

Fig. 3
figure 3

Electron Surface Potential (ESP) Analysis. (A) Acacetin, (B) Apigenin, (C) Coniferylaldehyde, (D) Coumestan, (E) Diosmetin, (F) Kaempferide, (G) Kaempferol, (H) Psoralen, (I) Skullcapflavone II, (J) Syringic acid, (K) Tricetin, (L) Vanillic acid, and (M) Selvita (Control Anti-leukemic Drug).

Identification of common targets of compounds and AML

Initially, the analysis of potential targets of anti-cancer compounds identified 953 targets. After removing redundant entries, this set was refined to 252 distinct targets. Separately, the study of AML targets was performed using the MalaCards and DisGeNet databases, resulting in an initial set of 3755 targets. Following the removal of duplicates, this set comprised 3262 unique targets. The intersection of these two target sets (anti-cancer compounds and AML) was then determined, revealing 121 common targets. The intersection shown in Fig.4 highlights potential shared targets between the compounds and AML.

Fig. 4
figure 4

Venn diagrammatic analysis of common targets between the targets of AML and the targets of compounds. The intersection indicates the common targets.

GO and KEGG pathways analysis

Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses were performed to elucidate the functional roles of the identified 121 common targets. GO analysis revealed significant enrichment in biological processes crucial to cellular function and implicated in AML pathogenesis. The Biological Process terms revealed significant enrichment in fundamental cellular processes, such as phosphorylation, protein phosphorylation, negative regulation of apoptosis, and signal transduction. The Molecular Function terms further underscored the prevalence of kinase-related activities, including protein tyrosine kinase, serine/threonine kinase, and serine kinase functions. Additionally, the Cellular Component analysis indicated the localization of the target proteins within the cytoplasm, cytosol, receptor complexes, and nucleoplasm. KEGG pathway analysis further supported the therapeutic relevance of these targets in AML, showing significant enrichment in pathways associated with cancer, including the “AML” pathway. The enriched proteins in the AML pathway were FLT3, STAT3, KIT, PIM1, MAPK1, AKT1, PIM2, PIK3R1, MPO, RELA, NFKB1. The GO and KEGG pathway enrichment are presented in Fig. 5.

Fig. 5
figure 5

GO and KEGG Pathway analysis. (A) The bubble plot indicates enrichment of GO terms, and (B) bar plot indicates KEGG pathway enrichment.

Hub targets and protein clusters of protein–protein interaction networks

At first, the protein–protein interaction (PPI) network was constructed using the STRING 12 database with the highest confidence interaction score of 0.900, focusing on the 121 common proteins identified. The visualized network in Cytoscape identified several key hub targets and protein clusters within the common targets of the selected compounds and AML. Using multiple centrality measures, including degree, closeness, betweenness, maximum neighborhood component (MCC), and maximal clique centrality (MNC), the top 10 hub targets were evaluated. Degree centrality measures the number of direct connections a node has and indicates the immediate influence or activity of a node in the network. Closeness centrality measures how close a node is to all other nodes in the network. Betweenness centrality measures how often a node lies on the shortest path between two other nodes. On the other hand, Maximum neighborhood component (MCC) centrality identifies the largest group of nodes that are all connected to each other, revealing influential clusters in the network, while Maximal clique centrality (MNC) measures the size of the largest complete subgroup of nodes that are all connected to each other, highlighting the most densely connected regions. By combining these centrality measures, the top 10 most influential or “hub” nodes in the network were identified and ranked. The top-ranked shared protein across all methods was AKT1, followed by ALB, STAT3, BCL2, CASP3, EGFR, HSP90AA1, ESR1, NFKB1, and PTGS2 (Fig. 6). Four distinct clusters were also identified within the network, with Cluster 1 demonstrating the highest score (29.778), while Clusters 2, 3, and 4 exhibited scores of 4.4, 4.308, and 3, respectively. Cluster analysis is essential in a network to identify groups of interconnected nodes with similar characteristics, revealing hidden patterns and structures within the network. Common proteins that resided in the AML pathway and hub network were AKT1. STAT3, while common proteins between the AML pathway and protein clusters were AKT1, STAT3, NFKB1, RELA, KIT, FLT3, MPO, and PIK3R1 (Fig. 7). Hence, these hub targets and clusters provide insights into the crucial proteins and functional modules associated with AML.

Fig. 6
figure 6

Identification of Hub proteins in 5 distinct centralities: (A) Degree, (B) Closeness, (C) MMC, (D) MNC, and (E) Betweenness. Red indicates a higher score in hubs, and Yellow indicates a lower score in hubs.

Fig. 7
figure 7

Determination of Protein Clusters represents functional units that may function together in a pathway or process. (A) Whole Network, (B) Cluster 1 (Score: 28.779), (C) Cluster 2 (Score: 4.4), (D) Cluster 3 (Score: 4.308), and (E) Cluster 4 (Score: 3).

Identification and validation of drug targets

The analysis of proteins involved in AML pathways has revealed several promising drug targets for the management of this aggressive form of leukemia. The findings demonstrate in Table 5 that many of the identified proteins have already been successfully targeted by FDA-approved drugs or are currently in clinical trials for other cancer types, suggesting their potential therapeutic relevance in the context of AML. Notably, proteins such as KIT and FLT3 are classified as FDA-approved drug targets and have demonstrated successful validation in the treatment of metastatic colorectal cancer and tenosynovial giant cell tumors, respectively. Similarly, PIK3CA, PIK3CD, and AKT1 have been identified as successful targets in lymphoma, mature B-cell leukemia, and breast cancer, respectively. In addition to these well-established targets, the study also identified several emerging targets, such as STAT3, PIK3CB, PIM1, and PIM2, which are not yet FDA-approved drug targets but are currently undergoing clinical trials for other cancer types, suggesting that these proteins may represent novel and untapped therapeutic opportunities for AML. Furthermore, the inclusion of MAPK1 and NFKB1/RELA, which are FDA-approved drug targets and are currently being explored in clinical trials for various cancers, further strengthens the potential of these proteins as viable targets for AML treatment. Moreover, MPO was identified as a potential drug target, although its therapeutic potential for AML remains to be fully explored.

Table 5 Current status of most influenced proteins as drug target potentials based on pathway involvement, hub proteins, and protein clusters.

Molecular docking verification of potential drug targets

FDA-approved drug targets AKT1 and STAT3 have been found in all hub protein networks, protein clusters, and AML pathways, indicating the possibility of becoming promising targets for AML treatment. On the other hand, TTD revealed that these two targets are already successfully approved drugs against colorectal and breast cancers. Interestingly, FLT3 is already an approved drug target against AML, and PIM1 has already been included in the clinical trial target for AML. Therefore, apart from the promising targets AKT1, KIT, STAT3, PIK3CA, PIK3CB, PIK3CD, PIK3R1, NFKB1, PIM2, and MAPK1, which are anti-apoptotic and help in the rapid proliferation of cancer cells in AML pathway, FLT3, and PIM1 have been considered as the most specific, relevant, promising, and significant therapeutic targets according to AML pathway, hub proteins and protein clusters. MPO has a role in the block of differentiation of cells, but lacks evidence of being a therapeutic target against cancer. The promising drug target pathway has been shown in Supplementary Fig. S3.

After initially screening using AutoDock VINA (Fig. 8), the scores for the most specific and promising targets (FLT3 and PIM1) were identified and revalidated using GLIDE Docking to maintain docking accuracy. For FLT3, higher GLIDE docking scores were observed in Kaempferol (-8.931 kcal/mol), and Apigenin (-8.752 kcal/mol) compared to the control drug Pacritinib (-5.403 kcal/mol). On the other hand, for PIM1, higher docking scores were observed for Tricetin (-8.634 kcal/mol) and Diosmetin (-7.780 kcal/mol) compared to the control drug SEL24 (-6.385 kcal/mol). Table 6 presents detailed information on the chosen PDB structure, ID of compounds and control drugs, and protein–ligand interaction data. Selected crystal structures of the target proteins from the Protein Data Bank database were X-ray crystallographic, with resolution < 3.0 Å, and containing co-crystallized inhibitory ligands, which enable direct identification of the inhibitory sites on the protein structures. The inhibitory sites were identified by observing the already experimentally validated co-crystallized ligand binding sites. Subsequently, performed VINA and GLIDE docking, where in the case of AutoDock Vina, a blind docking is performed and compared their interacting sites with the inhibitory sites (Table 6). A 20 Å cubic grid box was generated with the inhibitory ligand positioned at its center for GLIDE docking. The GLIDE docking validates whether the compounds fit into the inhibitory sites and bind to the pocket of the inhibitory sites. On the other hand, the blind docking of AutoDock VINA confirmed the binding of AKT1, FLT3, MAPK1, and PIM1 at the direct inhibitory sites where they interact with several overlapped amino acids responsible for the inhibitory action of protein activity. However, in the case of PIK3CA and PIK3CD, no direct overlapped amino acids have been observed between inhibitory sites and interacting sites, indicating they may induce inhibitory activity by modulating allosteric sites or actually not be involved in inhibitory activity at all. For STAT3, a diverse interaction pattern was observed, where the inhibitory site, the interacting sites of the compound, and the control drug were completely different, indicating uncertainty in inhibiting the biological activity of STAT3. Control drugs were FDA-approved and specific to target proteins.

Fig. 8
figure 8

The heatmap analysis of molecular docking results (AutoDock VINA). Dark blue color indicates strong binding affinity (Kcal/mol) and light blue color indicates weak binding affinity (Kcal/mol).

Table 6 Information on the crystal structure of proteins, inhibitory sites targeted by GLIDE docking, top compounds based on AutoDock VINA, and GLIDE docking compared to FDA-approved control target-specific anticancer drugs, and amino acids involved in protein–ligand interaction from blind molecular docking with AutoDock VINA.

Figure 8 and Supplementary Table S1 represent the VINA and GLIDE docking scores evaluated for complexes of promising targets and anti-cancer compounds of E. prostrata, respectively. Figure 9 shows the interactions between the target and compounds.

Fig. 9
figure 9

Molecular docking binding interactions. (A) PIM1-Diosmetin, (B) PIM1-Tricetin, (C) PIM1-SEL24 (Control) Complexes; (D) FLT3-Kaempferol, (E) FLT3-Apigenin, (F) FLT3-Pacritinib (Control) Complexes.

Molecular dynamics simulation for target verification

Molecular dynamics simulation revealed favorable dynamic properties of protein-compound complexes and also verified the potential of targets. Key properties like RMSD (Root Mean Square Deviation) and RMSF (Root Mean Square Fluctuation) are presented in Fig. 10. Other Properties, like Protein–Ligand Contact information, RoG (Radius of Gyration), SASA (Solvent Accessible Surface Area), PSA (Polar Surface Area), and MolSA (Molecular Surface Area), have been incorporated in Supplementary Figs. S4, S5, and S6, respectively. The average RMSD of PIM1-Diosmetin and PIM1-Tricetin were 2.02 Å and 2.16 Å, respectively. In comparison to these two compounds, the average RMSD of the control drug SEL24 was 2.32 Å. Lower RMSD means higher stability; therefore, our compounds indicate higher stability than the control drug. In the case of FLT3, the average RMSD of FLT3-Kaempherol, FLT3-Apigenin, and FLT3-Pacritinib (Control) was 2.90 Å, 2.91 Å, and 3.18 Å. The RMSD of control is higher than 3 Å, which is beyond the desired limit of 1–3 Å. In the case of the RMSF of PIM1, the larger peak was observed at amino acid position 171 with an RMSF of 7.00 Å for Tricetin and 4.99 Å for SEL24. Another peak was observed at position 233 with RMSF for Tricetin 5.39 Å and 3.82 Å for SEL24. For FLT3, the RMSF of Pacritinib (Control) provides several highest peaks at positions 48 (4.45 Å), 146 (5.54 Å), 162 (4.58 Å), and 225 (4.63 Å). The highest peak was observed at position 230 (4.17 Å) for Apigenin.

Fig. 10
figure 10

Molecular Dynamics Simulation Analysis. (A and C) RMSD and RMSF of PIM1-Diosmetin-Tricetin-SEK24 Complexes, respectively, (B and D) RMSD and RMSF of FLT3-Kaempferol-Apigenin-Pacritinib Complexes, respectively.

PCA and DCCM analysis

In a protein–ligand interaction, not the whole protein is favorable for the interactions between targets and compounds; instead, only a few percent of amino acids are responsible for these interactions. Identification of those percentages provides insights into the variance and stability during dynamic simulation. For the first 20 modes of motion, a graph of eigenvalues (protein) versus eigenvector index (eigenmode) in PCA plots is shown in Supplementary Fig. S7. The eigenvalues showed how the eigenvectors of hyperspace fluctuated. According to simulation analyses, the overall movement of the target protein is controlled by eigenvectors with larger eigenvalues. The highest proportion of variance observed in PC1 was 29.58%, 21.34%, and 21. 51% for PIM1 complexes with Diomestin, Tricetin, and SEL24 (Control). For PC3, the lowest variances observed were 7.41%, 9.23%, and 7.34%. On the other hand, the highest variance for PC1 was 34.3%, 38.2%, and 31.75% in FLT3 in complex with Kaempherol, Apigenin, and Pactrinib (Control), respectively. The lowest variances were 7.95%, 9.8%, and 7.07% for PC3.

DCCM (Dynamic Cross-Correlation Matrix) of the 200 ns molecular dynamics trajectory revealed the dominant modes of motion and the correlations between the motions of different residues or atoms within the protein–ligand complex, respectively. Both FLT3 and PIM1 showed positive and negative correlations in their distribution, according to the residue index maps (Supplementary Fig. S8). The pairwise cross-correlation coefficient values on the cross-correlation map are almost similar for all complexes, suggesting a significant association. In essence, PCA shows what moves, and DCCM shows how different parts move in relation to each other by quantifying the magnitude of fluctuations and identifying correlated/anti-correlated motions of key residues.

Binding free energy calculation using MMGBSA

MM-GBSA calculations for FLT3 revealed that both Kaempferol (-73.75 kcal/mol) and Apigenin (-68.76 kcal/mol) exhibited significantly stronger binding affinity than the control Pacritinib (-51.27 kcal/mol). This difference is primarily attributed to more favorable electrostatic, hydrophobic, and van der Waals interactions for Kaempferol and Apigenin. On the other hand, MM-GBSA calculations for PIM1 revealed that Tricetin (-64.28 kcal/mol) displays stronger binding affinity than both Diosmetin (-52.2 kcal/mol) and the control SEL24 (-53.38 kcal/mol). This enhanced binding is largely driven by significantly more favorable electrostatic and van der Waals interactions in Tricetin, despite less favorable hydrophobic contributions compared to SEL24. The substantial difference in solvation energy between Tricetin and SEL24 also contributes to the observed binding affinity difference. SEL24 exhibits a binding affinity of -53.38 kcal/mol, characterized by strong electrostatic interactions (-51.59 kcal/mol) but hindered by unfavorable solvation energy (55.63 kcal/mol). All binding energies are shown in Fig. 11.

Fig. 11
figure 11

Binding Free Energy calculation using MM-GBSA. (A) Complexes of FLT3-compounds and (B) Complexes of PIM1-compounds.

Calculated biological effect

3D-QSAR (Quantitative Structure–Activity Relationship) models serve as cost-effective and time-efficient alternatives to in vitro IC50 determination by establishing mathematical relationships between molecular structural features and biological activity, enabling rapid virtual screening of novel compounds without the need for resource-intensive laboratory testing89. The features that were considered during developing models were Gaussian Steric, Gaussian Electrostatic, Gaussian hydrophobic, Gaussian H-bond Acceptor, and Gaussian H-bond Donor. In the PIM1 model, the R2 and Q2 were 0.96 and 0.93, respectively, with a standard deviation of 0.22 only. The R2 CV (cross-validation), p-value, RMSE, and Pearson-r were 0.8097, 8.52e-29, 0.29, and 0.97, respectively. On the other hand, for the FLT3 model, the R2 and Q2 were 0.95 and 0.85, respectively, with a standard deviation of 0.29, R2 CV of 0.8850, the p-value of 9.32e-44, RMSE 0.48, and Pearson-r 0.93. In both of the models, the statistical measurement of R2 and Q2 indicates higher accuracy not only during learning but also while predicting unknown compounds. The training versus testing models have been presented in Fig. 12. Both active and inactive compounds were utilized for model development. IC50 < 10,000 nM has been considered an active compound according to the bioassay data of both the FLT3 and PIM1 from the PubChem Database biological activity declaration criteria67.

Fig. 12
figure 12

Establishment of Structure–Activity relationship. (A) and (C) represents the regression plot of real activity versus predicted activity for training datasets; (B) and (D) represent the regression plot of real activity versus predicted activity for testing datasets.

A comparative analysis of the predicted activities of two flavonoid compounds, Kaempferol and Apigenin, revealed promising inhibitory potential against FLT3. Notably, Kaempferol and Apigenin exhibit predicted activities (pIC50) of -2.693 (IC50: 493.17 nM) and -2.77 (588.84 nM), respectively, which are comparable to that of the control compound Pacritinib (Control_FLT3), a known FLT3 inhibitor with a predicted activity (pIC50) of -2.934 (IC50: 859.01 nM). On the other hand, for target PIM1, the predicted activity (pIC50) of Diosmetin was -2.719 (IC50: 523.60 nM), and Tricetin was -2.609 (IC50: 406.44 nM), showing stronger inhibitory activity comparable to the activity of control SEL24, which was -3.049 (IC50: 1119.44 nM). The lower the predicted activity value (pIC50), the more potent the compound is expected to be, and therefore, the calculated IC50 values of compounds were predicted to be more potent than control drugs. The IC50 values of all compounds are provided in Table 7.

Table 7 Predicted IC50 values of compounds of E. prostrata against targets FLT3 and PIM1.

Gene regulatory networks

Diosmetin and Tricetin could be effective against PIM1, and Kaempferol and Apigenin could also be effective against FLT3, but the regulation of genes PIM1 and FLT3 and their interactions with regulatory elements such as TFs and miRNAs can alter these effects. The analysis of miRNA interactions revealed that several miRNAs, including hsa-mir-335-5p, hsa-mir-125b-5p, and hsa-mir-26b-5p, exhibit a high degree of connectivity with the highest number of mRNA transcripts, suggesting their potential as post-transcriptional regulators of gene expression driving AML. Specifically, PIM1 exhibits a complex regulatory network, interacting with multiple microRNAs (hsa-mir-335-5p, hsa-mir-101-3p, hsa-mir-15a-5p, hsa-mir-16-5p, hsa-mir-192-5p, hsa-mir-26b-5p, hsa-mir-4434, and hsa-mir-5703) and transcription factors (TFs; ABL1, ERG, STAT1, and STAT4). In contrast, FLT3 displays a more limited interaction landscape, associating with only one TF (PML) and one microRNA (hsa-mir-150-5p). Finally, MPO interacts with a subset of microRNAs (hsa-mir-1324, hsa-mir-4716-5p, and hsa-mir-8485) and TFs (SP1 and RUNX1). Therefore, apart from only regulating mRNA expression, specific TFs and miRNAs could also be promising biomarkers for AML. Figure 13 shows the gene regulatory network, and the complete view of Transcription factors and miRNAs has been incorporated in Supplementary Fig. S9.

Fig. 13
figure 13

Regulation of AML gene expression influenced by Transcription Factors (TFs) and Micro RNAs (miRNAs). The green circles indicate Genes/mRNA transcripts, the red triangles indicate miRNAs, and the blue rectangles indicate TFs.

Identification and validation of novel protein biomarkers

Our study revealed two potential specific biomarkers, FLT3 and MPO, of AML. FLT3 is already an approved prognostic biomarker for AML. On the other hand, MPO exhibited significant upregulation in AML and was correlated with the overall survival difference. Notably, high expression levels of MPO were associated with increased overall survival in AML patients, underscoring its potential as a novel prognostic biomarker for AML. Furthermore, our results demonstrated that FLT3 was also significantly upregulated in AML and correlated with poor overall survival. The survival analysis revealed that high expression levels of FLT3 were associated with decreased overall survival in AML patients, reaffirming its prognostic significance in AML. In addition to their prognostic potential, both FLT3 and MPO may also possess diagnostic utility in AML. Our findings evaluate the diagnostic potential of FLT3, highlighting its ability to distinguish AML from other cancers. MPO, on the other hand, has been less extensively explored as a diagnostic biomarker for AML. However, our results suggest that MPO may also possess diagnostic potential, as its upregulation was specifically observed significantly in AML, while significantly downregulated in other cancers. Collectively, these findings highlight the potential of FLT3 and MPO as specific diagnostic and prognostic biomarkers for AML.

Additionally, our analysis suggested that KIT, PIK3CD, PIK3R1, and NFKB1/RELA may possess general cancer biomarker potential, as they were significantly upregulated in AML as well as other cancers. However, their lack of specificity to AML may limit their diagnostic utility. MAPK1, PIM1, and PIM2, on the other hand, demonstrated downregulation in AML and were associated with poor overall survival, suggesting their potential as prognostic biomarkers. However, further investigation is needed to determine their diagnostic potential. PIK3CB, despite being upregulated in AML, showed a significant difference in overall survival between low and high-protein expression groups, suggesting further exploration of its potential as a prognostic biomarker. Conversely, STAT3, PIK3CA, and AKT1 did not exhibit significant regulation in AML, suggesting that they may not be suitable biomarkers for AML diagnosis or prognosis. Table 8 shows the regulation of gene expression, while Figs. 14 and 15 show the significant patient survival analysis and gene expression analysis, respectively.

Table 8 Upregulated and downregulated expression patterns in AML.
Fig. 14
figure 14

Survival analysis among patients with AML using Kaplan–Meier Plots to identify potential prognostic biomarkers. The blue line indicates a low-level protein group, and the red line indicates high-level protein groups. The larger gaps between those two groups indicate significant survival differences among patients. (A-E) shows significant differences between survival with logrank p-value < 0.05, and (F) AKT1 is used as a negative control, which doesn’t show any survival difference.

Fig. 15
figure 15

Protein expression analysis among patients with AML to identify diagnostic biomarker potential of genes using box plots. The red box indicates expression in cancer patients, and the grey box indicates expression in healthy controls without cancer. The star mark (*) indicates a significant expression. (A-I) shows significant differences in high and low gene expression levels, and (J) AKT1 is a negative control used to compare the significance.

Cancer hallmarks for AML

The cancer hallmark graph demonstrated the involvement of the identified proteins in cancer development. The involvement of all promising initially identified proteins of AML from hub proteins, protein clusters, and pathways is represented in Fig. 16 A for all types of hallmarks of Cancer. A more precise graph has been generated in Fig. 16 B based on gene expression and patient survival data to acknowledge the relevance of previously identified biomarkers. The latter precise graph represented four specific and statistically significant (p < 0.05) hallmarks of cancer: Sustained Angiogenesis (PIM1, FLT3, and MAPK1), Programming Energy Metabolism (PIM1, FLT3, and MAPK1), Resisting Cell Death (PIM1, FLT3, MPO, and MAPK1), and Tumor Promoting Inflammation (MPO and MAPK1). The details of these findings have been incorporated in Supplementary Tables S2 and S3.

Fig. 16
figure 16

Identification of Cancer Hallmarks. (A) Cancer Hallmarks depict using proteins from hub proteins, protein clusters and pathway analysis (AKT1, KIT, PIM1, PIM2, FLT3, MAPK1, NFKB1, PIK3CA, PIK3CB, PIK3CD, PIK3R1, STAT3, MPO), (B) Cancer Hallmark depict using proteins from gene expression, and survival analysis (PIM1, FLT3, MPO, and MAPK1). Black color represents non-significant, and other colors represent significant. Apart from black color bars, all colorful bars represent statistically significant results, and large bars represent more statistically significant.

Discussion

AML is an aggressive and often fatal hematological malignancy, and therefore, the development of more effective targeted therapies and the identification of robust biomarkers is crucial95,96. E. prostrata is a medicinal plant that has proven anti-cancer activity, but the mechanism of action of E. prostrata against AML has not been explored yet97,98,99. The present structural and systems biology study provides valuable insights into the potential mechanisms of action of the medicinal plant E. prostrata against AML to reveal and identify potential therapeutic potentials and biomarker possibilities.

Our comprehensive analysis evaluated the 12 potential active anti-cancer compounds from E. prostrata and uncovered the two most promising therapeutic targets (FLT3 and PIM1) along with some promising anti-cancer compounds (Tricetin, Diosmetin, Kaempferol, Apigenin, etc.) against those targets. Furthermore, we identified two specific biomarkers for AML with both diagnostic and prognostic capabilities (FLT3 and MPO).

The findings of our study are consistent with previous research demonstrating the anticancer properties of E. prostrata and its potential therapeutic applications 98,99. However, the systems biology approach employed in our investigation provides a more comprehensive and systematic analysis of the underlying molecular mechanisms, expanding our understanding of the specific molecular targets and pathways, post-transcriptional regulation of targets, and diagnosis and prognosis capabilities of targets. This holistic evaluation enables the identification of possible novel biomarkers, therapeutic targets, and therapeutics that could guide the development of more effective and targeted treatment strategies for AML.

At the beginning of our study, we determined the Gene Ontology (GO) and KEGG Pathway enrichment analysis with 121 initially identified proteins that were common between proteins predicted from the compounds and AML-associated proteins, revealing significant enrichment in pathways related to cancer, including the “AML” pathway. The most enriched proteins in the AML pathway were FLT3, STAT3, KIT, PIM1, MAPK1, AKT1, PIM2, PIK3R1, MPO, and NFKB1. In addition, the enrichment in kinase-related activities, such as protein tyrosine kinase, serine/threonine kinase, and serine kinase functions, suggests that the E. prostrata compounds may predominantly target regulatory enzymes, which are commonly dysregulated in AML. This finding confirmed the relevance of the identified compounds and targets to the pathogenesis of AML and highlighted their potential as therapeutic targets.

Moreover, we constructed a PPI network with the same 121 identified proteins which highlighted several key hub targets (Proteins) including AKT1, ALB, STAT3, BCL2, CASP3, EGFR, HSP90AA1, ESR1, NFKB1, and PTGS2 based on five topological properties: degree centrality, closeness centrality, betweenness centrality, Maximum neighborhood component (MCC), Maximum clique centrality (MNC). These hub targets are known to play pivotal roles in various cellular processes, such as cell proliferation, apoptosis, and signal transduction, which are often dysregulated in AML. On the other hand, AKT1, STAT3, NFKB1, RELA, KIT, FLT3, MPO, and PIK3R1 were enriched in protein clusters that also resided in the AML pathway. Therefore, the identification of these hub proteins and protein clusters provided valuable insights into the potential mechanisms of action of the E. prostrata compounds and suggested that they may modulate these critical pathways by exerting their anti-leukemic effects.

Furthermore, our analysis revealed that several of the identified targets, including KIT, FLT3, PIK3CA, PIK3CD, and AKT1, have already been successfully targeted by FDA-approved anti-cancer drugs. Apart from that, we identified several emerging targets, such as STAT3, PIK3CB, PIM1, and PIM2, which are also currently undergoing clinical trials for some cancer types 47. These findings highlight the potential of these targets as valid therapeutic opportunities for AML (Table 5). Interestingly, among these targets, FLT3 and PIM1 are already FDA-approved and clinical-stage drug targets for the treatment of AML, respectively, confirming the validity, accuracy, and relevance of our study in unveiling the molecular mechanisms of compounds from E. prostrata.

After unveiling and validating the therapeutic targets, we focused on the verification of the therapeutic potential of our initially identified 12 compounds against those targets by employing molecular docking and molecular dynamics simulation approaches. Molecular docking scores revealed that Kaempferol and Apigenin exhibited strong binding affinities to FLT3, with values of -8.931 and -8.752 kcal/mol, respectively, significantly surpassing the control drug Pacritinib (-5.403 kcal/mol). Similarly, for PIM1, Tricetin (-8.634 kcal/mol) and Diosmetin (-7.780 kcal/mol) demonstrated enhanced binding compared to SEL24 (-6.385 kcal/mol). Molecular dynamics simulations further strengthen these findings, showing lower average RMSD values for the compound complexes (PIM1-Diosmetin: 2.02 Å; PIM1-Tricetin: 2.16 Å) compared to SEL24 (2.32 Å), indicating greater stability. For FLT3, the averages were also favorable (FLT3-Kaempferol: 2.90 Å; FLT3-Apigenin: 2.91 Å) versus Pacritinib (3.18 Å). The binding free energy calculations using MMGBSA further confirmed these observations, with Kaempferol (-73.75 kcal/mol) and Apigenin (-68.76 kcal/mol) exhibiting significantly stronger binding affinities than Pacritinib (-51.27 kcal/mol). For PIM1, Tricetin (-64.28 kcal/mol) outperformed Diosmetin (-52.2 kcal/mol) and SEL24 (-53.38 kcal/mol). Collectively, these results suggest that both FLT3 and PIM1, when targeted by compounds Kaempferol, Apigenin, Tricetin, and Diosmetin, not only demonstrate favorable binding characteristics but also exhibit enhanced stability and interaction dynamics, making them compelling candidates for therapeutic interventions in AML. Controls are selected as FDA-approved drugs (e.g., Pacritinib for FLT3 and SEL24 for PIM1) to benchmark and compare the binding affinities of novel compounds against established standards. The comparison of docking results with controls shows a consistently higher value than the control drugs for all targeted proteins. This superior binding affinity and stability of natural flavonoids over approved drugs highlight their promising potential as safer, effective novel therapeutics for AML through targeted inhibition.

In addition, our predictive models for both targets, PIM1 and FLT3, demonstrated robust performance, as evidenced by high R2 and Q2 values (PIM1: R2 = 0.96, Q2 = 0.93; FLT3: R2 = 0.95, Q2 = 0.85), indicating excellent performance between predicted and experimental activities. This strong predictive capability was further supported by low RMSE values and highly significant p-values. Focusing on the identified hit compounds, analysis revealed that for PIM1, Diosmetin (pIC50 = -2.719, IC50 = 523.60 nM) and Tricetin (pIC50 = -2.609, IC50 = 406.44 nM) exhibited comparable inhibitory potency to the control SEL24 (pIC50 = -3.049, IC50 = 1119.44 nM), suggesting promising inhibitory activity. Similarly, for FLT3, Kaempferol (pIC50 = -2.693, IC50 = 493.17 nM) and Apigenin (pIC50 = -2.77, IC50 = 588.84 nM) displayed inhibitory activities comparable to the control Pacritinib (pIC50 = -2.934, IC50 = 859.01 nM). The effect of pacritinib sensitivity in primary AML samples was observed in a range between IC50 2.0 nM to 1000 nM100. In our study, we observed an IC50 of 859.01 nM for FLT3, further suggesting the robust accuracy of our predicted model. On the other hand, the IC50 ranges were observed between 30 to 2944 nM for SEL24. The variation in IC50 was observed due to variations in different cell lineages and experimental conditions101. The IC50 of SEL24 observed in our study for PIM1 was 1119.44 nM, again indicating the robust performance of our predicted model. Since IC50 values of all the compounds were in the range of 58.34 nM to 1196.74 nM, which is less than 10,000, all the compounds were considered highly potent to inhibit the activity of targets.

The intricate interplay between microRNAs (miRNAs) and transcription factors (TFs) in regulating gene expression in AML also underscores their potential as pivotal biomarkers and therapeutic targets 58. Our analysis reveals that specific miRNAs, particularly hsa-mir-335-5p, hsa-mir-125b-5p, and hsa-mir-26b-5p, exhibit extensive connectivity with mRNA transcripts, highlighting their role as key post-transcriptional regulators that may drive AML pathogenesis and progression of the disease. Notably, PIM1 exhibited a highly connected regulatory network, interacting with multiple miRNAs (hsa-mir-335-5p, hsa-mir-101-3p, hsa-mir-15a-5p, hsa-mir-16-5p, hsa-mir-192-5p, hsa-mir-26b-5p, hsa-mir-4434, and hsa-mir-5703) and TFs (ABL1, ERG, STAT1, and STAT4), suggesting multiple and complex therapeutic interventions. In contrast, FLT3 exhibited a more limited interaction landscape, associating with only one TF (PML) and one miRNA (hsa-mir-150-5p), while MPO interacted with a distinct subset of miRNAs (has-mir-1324, hsa-mir-4716-5p, and hsa-mir-8485) and TFs (SP1 and RUNX1), indicating their unique biomarker potential. Overall, these findings highlight the potential of these specific TFs and miRNAs as additional therapeutic targets and biomarkers for AML.

Subsequently, the identification of FLT3 and MPO as potential biomarkers for AML further amplified the significance of our study. FLT3, a well-established prognostic biomarker for AML91, was also found to be specifically and significantly upregulated in AML, suggesting its diagnostic potential. On the other hand, MPO, which is not as extensively explored as a biomarker for AML, possesses both diagnostic and prognostic potential specifically for AML. The upregulation of MPO, specifically in AML, and its correlation with decreased overall survival, underscores its value as a potential novel biomarker for AML60.

Finally, the analysis of cancer hallmarks revealed that these specific biomarkers (FLT3 and MPO), along with PIM1 and MAPK1, are significantly implicated in pathways crucial for AML development, including sustained angiogenesis, programmed energy metabolism, resisting cell death, and tumor-promoting inflammation. The involvement of these biomarkers in multiple hallmarks established a particular AML-specific link between biomarkers and their therapeutic potential. Apart from FLT3 and MPO, other candidate biomarkers showed upregulation in both AML and other cancers, which limited their diagnostic utility. The AML-specific expression patterns of FLT3 and MPO, coupled with their association with key cancer hallmarks, strongly support further development as effective diagnostic and prognostic tools for improved AML patient stratification and targeted therapeutic interventions. Additionally, the development of a biomarker panel incorporating both FLT3 and MPO may enhance the accuracy of AML diagnosis and prognosis and may also enable the identification of high-risk AML patients, who may benefit from more aggressive treatment approaches, ultimately contributing to improved patient management and treatment strategies. The overall scenario of the AML-specific linked network and the mechanisms of E. prostrata against AML is shown in Fig. 17.

Fig. 17
figure 17

Overall Scenario of Disease-Targets-Drug candidates. (A) Network of E. prostrata in the management of AML (Created using Cytoscape, version 3.10.3: https://cytoscape.org/). Dark Blue Circle Shape = Potential Drug Targets; Green Diamond Shape and Red Circle Shape = Most Promising Targets; Green and Orange Diamond Shape = Promising Biomarker Candidates; Light Blue, Light Green, and Light Red Shapes = Drug Candidates. (B) Mechanisms of E. prostrata against AML102,103 (Created using Canva, free version: https://www.canva.com/). Compounds derived from Eclipta prostrata, including Apigenin, Acacetin, Diosmetin, Tricetin, Coumestan, Kaempferol, and Kaempferide, exert potential anti-AML effects by targeting key proteins such as FLT3, AKT1, MAPK1, and PIM1. The receptor tyrosine kinases KIT and FLT3, located on the cell membrane, collaboratively inhibit apoptosis by modulating both the PI3K-AKT and JAK-STAT signaling pathways. Concurrently, KIT and FLT3 promote cellular proliferation through the MAPK signaling pathway. Therapeutic inhibition of FLT3, AKT1, MAPK1, and PIM1 represents an optimal strategy to suppress AML progression.

The main limitation of our study is the reliance on in-silico analyses only, which, while providing valuable insights, require experimental validation to confirm the biological relevance and therapeutic potential of the identified compounds and targets. Despite this limitation, our study is still extremely significant due to the development of robust computational models (PIM1: R2 = 0.96 and FLT3: R2 = 0.95) from already experimentally validated bioassay data of both FLT3 and PIM1. Both of the models have predicted the biological activity of compounds against therapeutic targets with higher predictive power (PIM1: Q2 = 0.93 and FLT3: Q2 = 0.85) in the form of IC50 values. This complementary computational method established mathematical relationships between molecular structural features and biological activity, allowing for quick virtual screening of new compounds without requiring resource-intensive laboratory testing. Even though a complete replacement of the in vitro assay is not possible, it is still utilized as a time and cost-efficient substitute for efficient in vitro biological activity determination89. In this study, the 3D-QSAR models predict the biological activity (IC50 values) of unknown compounds from known in vitro bioassay data, supporting our study and covering up the lack of an experimental validation study to a great extent.

Moreover, to our knowledge, it is the first time for a study to screen and identify the most promising active anti-cancer compounds from E. prostrata based on chemical and electronic properties, coupled with the elucidation of their mechanism of action against AML. These examinations provide significant strength to our study by exhibiting valuable insights into promising clinical significance due to their therapeutic potential, together with diagnostic and prognostic possibilities. Hence, the findings of our study present opportunities to explore the potential of E. prostrata-derived compounds as therapeutic agents, either alone or in combination with standard AML treatments, to improve patient outcomes104,105.

Our study pointed to several future research avenues and directions. Due to the promising activities and favorable properties of 12 identified compounds, the available derivatives of each of the 12 compounds could also exert better promising outcomes than the original. Additionally, the identified targets also need more extensive study in the case of AML. Our study is limited to in silico studies, and therefore, further in vitro and in vivo experimental validation, such as cell line assays and gene expression analysis, is necessary to confirm the biological activities of promising compounds to advance and clinical implications of this study. Furthermore, the discovery of novel biomarker potentials of FLT3 and MPO for both specific diagnostic and prognostic possibilities could enable the development of more accurate diagnostic and prognostic tools and facilitate personalized treatment approaches for AML needs further experimental validation.

Conclusion

This in-silico study demonstrates the therapeutic potential of E. prostrata against AML by elucidating its mechanism of action. Our integrated computational approach identified Kaempferol and Apigenin as promising FLT3 inhibitors and Tricetin and Diosmetin as potent PIM1 inhibitors, supported by strong binding affinities and binding free energies. The predicted pIC50 values (-1.766 to -3.049) developed using robust 3D-QSAR models further validated our findings. The identification of key regulatory elements, including specific microRNAs (hsa-mir-335-5p, hsa-mir-125b-5p, and hsa-mir-26b-5p) and transcription factors (PML, SP1, and RUNX1), and the potential diagnostic and prognostic biomarkers (FLT3 and MPO) significantly advances our understanding of E. prostrata's anti-AML activity. Due to the in-silico limitations of our study, these findings suggest further in vitro and in vivo investigations, such as cell line assays and gene expression analysis, to validate the efficacy and safety of these compounds, paving the way for the development of novel AML therapies and exploring combinatorial strategies to enhance the clinical efficacy against this aggressive malignancy. Additionally, further research into FLT3 and MPO as biomarkers could significantly advance AML diagnosis and prognosis and open up new avenues for personalized medicine and targeted therapy development.