Introduction

Colorectal cancer (CRC) is cancer of the colon or bowel. Every year, more than 1.8 million cases of colorectal cancer are reported, with 850,000 deaths. The death rate of colorectal cancer has been reported in previous studies, where it has been indicated that this cancer causes deaths that are the second-highest among all cancers1. The increasing incidence of colorectal cancer in the recent past has been reported, and it is estimated that the death rate may increase to 70% by the year 20352. Colon and rectal adenomas can turn into cancer due to various genetic and epigenetic changes that build up over time in the originally healthy cells of the rectum and colon.

The prevalence of CRC in the eastern and western locations has been discussed in the previous studies3. As far as the eastern countries are concerned, it has been reported that this cancer is more prevalent in the eastern regions that are economically strong, while it is less prevalent in the western regions that are economically less strong. The incidences of colorectal cancer in China show that approximately 20% patients suffering from this disease belong to the age group that is less than 30 years. In addition, it has been noted that this disease development is 12 years earlier as compared to Western countries4.

From a molecular standpoint, about 20% of CRC cases are caused by genetic changes, and first-generation relatives of CRC patients have a significant risk5. Genetic disorders that increase the risk of colorectal cancer include mutations in the familial adenomatous polyposis and Mismatch Repair Gene6. This disease shows a combination of different causes and symptoms7. Chromosomal instability is the prime molecular pathway whose dysregulation can lead to CRC8. It includes different alterations in the structure and number of chromosomes, including gains or losses of chromosome segments and alterations in the tumor suppressor gene.

Several mutations can lead to the CRC. These mutations can be either in the tumor suppressor genes, such as KRAS, p53, BRAF, and APC, or in oncogenes9. These genes regulate cell division and deoxyribonucleic acid (DNA) replication and are crucial in the initiation and progression pathways. These pathways can lead to cell proliferation, playing key roles in the development and progression of the disease. Although treatments like chemotherapy (FOLFIRINOX) and surgery can improve survival, they are not effective in managing advanced stages of colorectal cancer.

RAS genes encode proteins that help regulate cells. In 1975, researchers at the National Institutes of Health, led by Scolnick, discovered the first two RAS genes, HRAS and KRAS, while studying cancer-causing viruses10. The human versions of these genes were identified in 1982 and have been linked to many types of cancer. Among the three human RAS genes, KRAS mutations are the most closely related to cancers11. A KRAS gene mutation is present in around 40–52% of cases of colorectal cancer12. A study indicated that more than 40% of the cases having CRC have KRAS gene mutations. The KRAS gene encoding GTPase protein is a part of RAS family (GTPase) found on chromosome 1213, the protein product has molecular weight of 21 kDa, 189 amino acids, and binds GDP and GTP14 and its several mutations have been reported15. Constitutive activation of the KRAS gene, which codes for the GTPase protein, is caused by the mutations. This gene acts as a molecular switch, continuously stimulating downstream signaling pathways, including those that promote cell survival and proliferation, which ultimately leads to the proliferation of cells and the development of tumor formation16.

There are several genetic and molecular variables that may influence the development and progression of colorectal cancer. The MAPK/ERK and PI3K/Akt/mTOR signaling pathways are two of the most important contributors to these pathways. The MAPK/ERK pathway is often mutated or overactive in colorectal cancer due to the KRAS gene encoding a GTPase protein. This pathway is involved in drug resistance, inflammation, apoptosis, metastasis, and cancer cell growth. Inhibitors targeting this pathway have shown promising anti-cancer effects. Concerning this study, Fruquintinib (brand name Fruzaqla), which is an anti-cancer drug, is being used currently for treating colorectal cancer17,18. This study was undertaken to find out the potential phytochemicals against the KRAS gene encoding GTPase protein in CRC.

Methodology

3D structure retrieval and verification

3D structure retrieval

For the structure retrieval of the KRAS gene, which is overexpressed in CRC, the Universal Protein Resource (UniProt) (https://www.uniprot.org/) database was first consulted for detailed functional information. UniProt provides extensive data on proteins and their functional annotations19. The KRAS gene encoding GTPase protein consisting of 189 amino acids with a molecular weight of approximately 21.6 kDa was recovered from the Protein Data Bank (PDB) using the RCSB-PDB ID: 7SCW, which has a resolution of 1.98 Å. The Protein Data Bank (PDB) ( https://doi.org/10.2210/pdb7SCW/pdb) is a crucial resource for studying the 3D structures, aiding research and education in many fields20.

Structure verification

After structure retrieval, it was verified using tools like Verify 3D21, ERRAT22, and PROCHECK23 to evaluate the protein quality, the geometric quality of the protein structure and the protein model’s compatibility with its amino acid sequence (Scheme 1).

Scheme 1
scheme 1

Summary of the Process-flow.

Ligand and protein Preparation

The receptor was prepared using Discovery Studio Visualizer (DSV)24. Protein databank (PDB) was used for the retrieval of protein, and this structure was cleaned and edited using DSV24. All the heteroatoms and water molecules attached were removed from the KRAS gene encoding the GTPase protein. The cleaned protein was prepared by using AutoDock1.5.7v tools, and the PDB format was changed to pdbqt25.

Ligand library

17,967 phytochemicals were chosen from the online source phytochemical library known as Indian Medicinal Plants, Phytochemistry and Therapeutics 2.0 (IMPPAT 2.0) (https://cb.imsc.res.in/imppat/). The phytochemicals were chosen from this library26. A total of 17,967 ligands were filtered according to the ADMET criteria given in Table 1. 95 compounds were filtered out that fulfilled the ADMET criteria26,27. From 95 ADMET properties of the top ten ligands mentioned here fulfilled all these criteria for a drug. The ADMET properties and criteria are mentioned in Table 1.

Table 1 ADMET properties used.

Ligand preparation

Ligand preparation was performed using Discovery Studio Visualizer24 to ensure accurate docking. Initially, the chemical structures of the ligands were imported and formatted appropriately within the software. Hydrogen and partial charges were added. The ligands were subsequently converted into the required file format for docking studies (e.g., PDBQT). Finally, the prepared ligands were validated within Discovery Studio to confirm their suitability for the docking process.

Active site determination

The active site for docking was determined using Discovery Studio24 and CASTp28. First, the 3D structure of the receptor was attained from the Protein Data Bank (PDB) and analyzed using Discovery Studio24. The 2D diagram in the discovery studio was used to determine the active site and the amino acid residues that were found there, and CASTp28 was the same, which confirms the active site where the ligand binds to the receptor.

Grid box preparation

A grid box was prepared after finding which ligands were going to bind with the receptor. The size and spacing of the grid points inside the box were adjusted to ensure accurate positioning and evaluation of ligands29. Finally, grid information was saved in a format (pdbqt) that our docking software can use, so we can study how ligands interact with the receptor in our research. In terms of the KRAS gene, the dimensions were 29.224 Å, 22.434 Å, and − 6.244 Å. The exhaustion thresholds were set at 8 and 0, respectively, and the box size coordinates were 24, 26, and 24 in x, y, and z.

Molecular docking

AutoDock1.5.7v tools were used for docking25 the selected phytochemicals with the KRAS gene encoding GTPase protein receptors. Following the docking of 95 phytochemicals that met the ADMET requirements, the top 10 ligands with the lowest binding energy (high affinity) were chosen for further study30. Following the selection, the interaction may be visualized in the form of hydrogen bonds, hydrophobic bonds, and van der Waals forces. Subsequent analysis was performed with the help of the BIOVIA Discovery Studio Visualizer application31. According to the fact that it has the lowest binding energy, the ligand was selected for further research.

Reference compound molecular docking

To compare it with the reference compound, Fruquintinib, which is an approved FDA drug, was used against KRAS17. It is an anticancer drug having the PubChem ID: 44,480,399. PubChem was used to retrieve the SDF structure of this compound (https://pubchem.ncbi.nlm.nih.gov/compound/Fruquintinib). The compound was then opened in the MGL tools, Gasteiger charges and polar hydrogens were added25. The compound was then saved in PDBQT format. After the preparation of the reference compound, it was docked against the KRAS receptor using the same grid box dimensions and parameters as mentioned above for the phytochemicals. The interactions between the receptor and the ligand were visualized after docking using BIOVIA Discovery Studio Visualizer31. Its binding energy was − 9.4 kcal/mol, while the IMPPAT selected ligand indicated − 9.7 kcal/mol energy, making it a potential drug against colorectal cancer (CRC).

Density functional theory

The ten ligands with the lowest binding energies were investigated for DFT analysis using Gaussian 9.0. The energy gap known as ΔE32 is the difference between the Lowest Unoccupied Molecular Orbital (LUMO) and the Highest Occupied Molecular Orbital (HOMO). The compounds were optimized using (DFT/B3LYP), with 6-31G as the basis to predict molecule structure B3LYP method33,34. Gaussian 09 was used for performing the DFT analysis, and the resulting file, .chk, was viewed using Gauss View 6.035.

Molecular dynamics simulation

To evaluate the stability of the system, a 100 ns Molecular Dynamics (MD) simulation was performed36. Root-mean-square deviation (RMSD) was used to monitor structural changes, root-mean-square fluctuation (RMSF) was used to evaluate flexibility, the radius of gyration (Rg) was used to measure compactness, and hydrogen bond formation was used to observe interaction stability. All of these metrics were analyzed in order to identify the stability of the process36.

Results

Structure retrieval

Protein structure that was retrieved from Protein Data Bank with RCSB-PDB ID: 7SCW and resolution 1.98 as given below (Fig. 1):

Fig. 1
figure 1

3D structure of KRAS gene encoding GTPase protein (PDB ID: 7SCW, Resolution: 1.98 Å). (Made from the Discovery studio visualizer tool (2021 edition).

Structure verification

A verification of the protein’s structure was carried out with the assistance of the verify 3D, ERRAT, and PROCHECK programs. Ramachandran plot analysis under the PROCHECK program revealed that 92.9% residues were present in the permitted region, while 7.1 of % residues were present in the additionally permitted region23. The Ramachandran plot (Fig. 1S) was 92.9% and remains in additional permitted regions were 7.1% for the KRAS gene encoding GTPase protein. The quality score in the ERRAT program was 96.12 (Fig. 2S)22, while Verify 3D presented the status of the protein structure as ‘pass’ (Fig. 3S)21.

Screened ligands

Ligands were screened based on ADMET properties, which are essential in evaluating the pharmacokinetic and toxicological properties of drug candidates. From the ligand library, 95 compounds were screened, from which the top 10 with the lowest binding energy are given here with their ADMET properties and all fulfilling criteria of ADMET properties recommended ranges (Pollastri, 2010).

Table 2 Summary of the ADME properties of the top 10 compounds having high binding affinity.

Active site determination

The active site for docking was determined using Discovery Studio Visualizer24 and CASTp28. The amino acids that were common in both Discovery Studio and CASTp for the KRAS gene encoding GTPase protein were GLY12, GLY13, VAL14, GLY15, LYS16, SER17, ALA18, PHE18, VAL29, PRO34, THR35, THR58, GLY16, GLN61, ASN116, LYS117, ASP119, LEU120, SER145, ALA146, and LYS147. These were the amino acids that were found to be common in both software and confirmed that the KRAS active site contains these amino acids where receptor bound.

Docking analysis

95 phytochemicals filtered through ADME criteria were docked against the KRAS receptor. IUPAC names, PubChem IDs, structures, and chemical formulae are included in Tables 2 and 3 for KRAS, respectively. These tables also provide the list of the top 10 ligands of the KRAS protein that have the lowest energy interactions (highest binding affinity). Other ligands are indicated in the additional data. It was noted that the lowest energy ranged from − 9.7 kcal/mol to -9.0 kcal/mol. 2D representation of these ligands is mentioned in Fig. 4S.

Table 3 Compounds selected after docking with KRAS (arranged in order of increasing binding energies).

KRAS molecular docking analysis

An example of a ligand that has one of the lowest binding energies for compound 1 (2 S,3R,4 S,5 S,6R) ((6aR, 11aR)) -2-[[ Using the PubChem ID 23724664, we found the compound known as methyl-9-acetate-6a,11a-dihydro-6 H-benzofuro[3,2-c]chromen-3-yl]oxy. Hydroxymethyl or -6- With a binding energy of -9.7 kcal/mol, oxane-3,4,5-triol seemed to be the least energetic. Using the BIOVIA Discovery Studio Visualizer, the interactions that took place in the active sites were shown. As shown in Fig. 2, the interactions that take place between the ligand and the KRAS receptor include the following: pi-alkyl ALA 18, ALA 146, pi-cation LYS 117, hydrogen bonding THR 35, GLN 61, and Van der Waals forces GLY 12, THR 58, GLY 60, GLY 14, ALA 59, LYS 16, PRO 34, ASP 33, GLU 31, ASP 30, SER 17, TYR 32, GLY 15, ASN 116, ASP 119, and LYS 47. In the pi-alkyl interaction that was seen between ALA 18, ALA 146, and LYS 117, the pi-electron of the aromatic group and the electron of any alkyl group interact respectively. Three pi-alkyl bonds are formed by ALA 18 with the aromatic rings of the ligand. These bonds had distances of 5.17 Å, 4.91 Å, and 5.05 Å, respectively. In addition, ALA 146 can form pi-alkyl connections, with diameters of 3.94 Å and 4.88 Å. On the other hand, LYS 117 has three bonds, with dimensions of 4.97 Å, 4.74 Å, and 4.11 Å. Two different bond types are displayed in LYS 117: a pi-cation bond and a pi-pi T-shaped bond. Table 1S and Table 2 S lists the bond angles and binding energies, respectively.

Fig. 2
figure 2

(i) A zoomed-in view of compound 1 that is bound to the active site of KRAS, which is located on the right side of the image. The interaction between compound-1 and KRAS is shown in two different ways: (A) a two-dimensional view of the interacting residues and bond distance, and (B) a three-dimensional perspective of the same. (Made from the Discovery studio visualizer tool (2021 edition).

KRAS Docking analysis with fruquintinib

In this study, Fruquintinib served as a reference compound. The analysis revealed that the selected ligand (CID_23724664) was binding strongly with the KRAS protein, showing binding energy of -9.7 kcal/mol as compared to the Fruquintinib that showed binding energy of -9.4 kcal/mol. The interacting residues were visualized through DSV31, which indicated that the active site residues were also involved in the interaction with the ligand (Figs. 3, 4, 5, 6). As can be seen in Fig. 7, the interactions consisted of covalent bonds between ASP 33 and GLU 31, pi-alkyl PRO 34, and Van der Waals forces between ASP 33 and GLN 61a. When combined with 2.57, PRO 34 creates a pi-alkyl bond.

Fig. 3
figure 3

(i) indicates the interaction of the KRAS with the reference ligand, Fruquintinib; the right side image indicates a more closed interaction. (ii) Interaction between Fruquintinib and KRAS (A) shows a 2D interaction diagram, and type of bonding interactions (B) shows a 3D interaction diagram of interacting residues and distance. (Made from the Discovery studio visualizer tool (2021 edition).

DFT analysis

Figure 4 illustrates the determination of the energy gap (ΔE) for each of the ten ligands that were selected after docking. This was accomplished by calculating the difference between the ELUMO and EHOMO potentials32. The change in energy between the HOMO and LUMO has been indicated in the image for CID_23724664, is 0.19685 eV provides important information regarding the reactivity of the ligand with the receptor35. This low ΔE value suggested the reactivity of the compound (CID_23724664). While a lower HOMO–LUMO energy gap (ΔE) can indicate that a molecule is more chemically reactive and potentially more likely to interact with other molecules, it’s important to note that this doesn’t directly reflect how strongly it will bind to a protein. Binding affinity is more accurately determined using methods like molecular docking scores or binding free energy calculations, which consider the overall interaction between the ligand and the protein.

Fig. 4
figure 4

DFT analysis of the compound CID: 23,724,664, ΔE is 0.19685 eV. (A) shows the excited state of the ligand (ELUMO), (B) shows the ground state of the ligand (EHOMO) (was made using the Gauss View 6.0).

Molecular dynamics simulation results

In Fig. 5, the RMSD indicated that at the start, the protein and ligand show some structural configuration; however, it stabilized over time (Fig. 5). The second half of the RMSD analysis indicates that the protein-ligand complex is stable throughout the trajectory. The overall analysis reveals the protein-ligand complex stability.

Fig. 5
figure 5

Shows the RMSD analysis throughout the 100 ns trajectory.

In Fig. 6, the relative molecular weight (RMSF) of the protein along the trajectory is shown to highlight the variation of the individual amino acid residues of the KRAS gene that codes for the GTPase protein. This provides insights into the structural dynamics of the protein. The analysis of the first 1000 frames shows that there was some variation seen in the KRAS gene encoding GTPase protein with high flexibility, indicated by peaks. However, these regions were not included in the active site residues. In addition, some changes were observed in the last 1000 frames. After 50 amino acid residues, two peaks were observed at amino acid no.55 and amino acid no.60. Similarly, a slight difference in RMSF value was observed at amino acid no.130. Overall, the structure remained consistent.

Fig. 6
figure 6

In the first and last thousand frames, RMSF analysis of the protein-ligand complex was performed.

In Fig. 7, the Rg graph reveals that the KRAS gene encoding GTPase protein shows some little variations in the compactness of the protein. This analysis is necessary as it indicates the compactness of the protein structure. Moreover, stability and the native conformation of the protein are also supported by this analysis. The looser structure will indicate the higher values of Rg. In this analysis, we found that the protein’s structure showed compactness in its structure throughout the simulation run. The average Rg value was around 20 Å.

Fig. 7
figure 7

Shows the analysis of the radius of gyration of protein.

During the simulation, hydrogen bonds were established, and their analysis is shown in Fig. 8. This study provides a representation of the total number of hydrogen bonds that were formed during the simulation. However, the average number of hydrogen links discovered in the simulation trajectory was sixty. It was observed that the hydrogen bonds ranged from thirty to eighty. The stability of the protein and ligand combination may also be determined by the number of hydrogen bonds that are in the complex. This suggests that the protein and the ligand that we proposed have a positive interaction with one another. Based on the examination of the simulation, it was discovered that the number of hydrogen bonds that occurred during the simulation procedure averaged somewhere around 70.

Fig. 8
figure 8

No. of hydrogen bonds observed in the trajectory.

Discussion

Colorectal cancer (CRC) shows global prevalence and is a major health problem across the world. Both its incidence and mortality rates are expected to rise34. Early diagnosis of CRC can often be treated effectively. Limited access to early diagnosis and treatment in developing regions underscores the need for better management strategies. Mutations in the KRAS gene encoding GTPase protein account for 30–50% of CRC cases37 that can lead to rapid cell division and cell survival. These mutations typically affect codons 12, 13, and 6138 because of this, the PI3K-AKT-mTOR and RAF-MEK-ERK pathways are activated in a form that is constitutive form. These pathways are crucial for tumor progression.

In previous studies, it was stated that Fruquintinib is an anti-cancer drug that is currently used for treating colorectal cancer17 and this study found CID_23724664 (Medicarpin 3-O-glucoside) ( https://pubchem.ncbi.nlm.nih.gov/ ) as potential candidate to replace this drug for treating colorectal cancer because of its strong binding interaction (-97.7 kcal/mol) with target protein and it can inhibit pathways that result in colorectal cancer. In order to find out how reactive the ligands were, a Density Functional Theory (DFT) study was carried out39. In past studies, low values of ΔE suggested high binding affinity and stability of the interaction between the molecule and the target protein. In this study, the ΔE was 0.19685 eV, which is low which suggests that this ligand can bind effectively with the target protein or receptor39. The library of phytochemicals was selected as these have advantages over chemical compounds with low toxicity and have the potential to target multiple proteins for the improved efficacy and results26.

The previous studies have evaluated the impact of some compounds against NFKB1 and COX-2 (prime candidates causing CRC) in 2023. The interactions of these two proteins with vitexin and aspirin were evaluated through computational analysis. The study revealed that vitexin and aspirin made the two proteins more stable when bound to them. Moreover, the study also indicated that when both (vitexin and aspirin) were combinedly present against40. The combinatorial treatment indicated that the protein was more stable and flexible throughout the trajectory. In addition, the inhibition of the KRAS protein was evident by the higher binding affinity and increased compactness of the protein structure when bound to both vitexin and aspirin. In the current study, the analysis of RMSD showed that the protein-ligand complex was more stable as compared to protein alone. In addition, the KRAS gene encoding GTPase protein was found to be less fluctuating in the last 1000 frames as compared to the initial 1000 frames of the trajectory, when individual amino acid residues were analyzed through RMSF analysis. Moreover, the Rg analysis revealed that the protein structure was compact throughout the trajectory and a lot of hydrogen bonds were formed (an average of 70) during the trajectory, all indicating the stability of our protein-ligand complex41.

Compound 23,724,664 is predicted to bind at a critical allosteric pocket overlapping with the switch I/II region of KRAS, which is essential for effector interaction. The binding may stabilize an inactive conformation, potentially preventing interaction with RAF and other downstream signaling partners. This interference could suppress MAPK/ERK pathway activation, consistent with the intended inhibitory effect. The stable interaction profile observed over the simulation time supports this proposed mechanism. It is clear from these data that the interaction between the KRAS protein and the ligand (CID_23724664) is both stable and dynamic39. It is possible that the combination of these novel KRAS-targeted ligands with other therapies and the use of improved drug delivery technologies, like as nanoparticles, might significantly enhance therapeutic results, delivering major improvements over the treatments that are now available for patients with colorectal cancer42.

Conclusion

This study highlights the KRAS gene encoding GTPase protein overexpression in colorectal cancer and evaluates CID_23724664 (Medicarpin 3-O-glucoside) as a promising alternative to the current drug, fruquintinib. In silico docking reveals CID_23724664 has a slightly lower binding energy (-9.7 kcal/mol) compared to Fruquintinib (-9.4 kcal/mol), suggesting it may offer a stronger and more stable interaction with KRAS. These findings suggest CID_23724664 could improve CRC treatment, but further experimental and clinical studies are needed to confirm its effectiveness and safety.

Limitations of the study

The lack of in vitro and in vivo validation is a key limitation, and future experimental studies are essential to confirm the compound’s biological efficacy and pharmacological relevance.