DFT based structural modeling of chemotherapy drugs via topological indices and curvilinear regression

Saeed, Fatima; Idrees, Nazeran; Imran, Muhammad

doi:10.1038/s41598-025-97982-5

Download PDF

Article
Open access
Published: 30 September 2025

DFT based structural modeling of chemotherapy drugs via topological indices and curvilinear regression

Fatima Saeed¹,
Nazeran Idrees¹ &
Muhammad Imran¹

Scientific Reports volume 15, Article number: 33755 (2025) Cite this article

1950 Accesses
Metrics details

Subjects

Abstract

The study of thermodynamics and electronic structure of chemotherapy drug is crucial in developing effective cancer treatments. Quantitative Structure-Property Relationship (QSPR) analysis is an essential instrument in creating and enhancing chemotherapeutic drugs. This research employs Density Functional Theory (DFT) to compute thermodynamical and electronic characteristics of different chemotherapeutic drugs. Distance-based topological descriptors are utilized to assess the molecular structure of these chemotherapy drugs. These descriptors are subsequently employed in curvilinear regression models to forecast essential thermodynamical attributes and biological activities. We seek to improve the precision of QSPR models by correlating DFT-derived attributes with topological descriptors via curvilinear regression methods. Our results indicate that curvilinear regression models, especially those with quadratic and cubic curve fitting, markedly enhance the prediction capability for analyzing thermodynamical properties of drugs. Our findings further specify that Wiener index and Gutman index outperformed the indices in predicting the properties of drugs. This method offers an enhanced understanding of the thermodynamics of chemotherapeutic medicines and promotes the creation of more effective and safer therapeutic compounds. The findings could pave the way for more precise and personalised cancer treatment strategies, ultimately improving patient outcomes. The application of topological indices in QSPR modelling, which accounts for molecular symmetry, has significant promise in enhancing our comprehension of compounds’ structural and thermodynamical characteristics.

Predictive power of parametric temperature based topological indices with applications in structure property modeling of anti tuberculosis drugs

Article Open access 13 November 2025

Predicting bone cancer drugs properties through topological indices and machine learning

Article Open access 24 August 2025

Comparative study of degree, neighborhood and reverse degree based indices for drugs used in lung cancer treatment through QSPR analysis

Article Open access 29 January 2025

Introduction

Chemotherapy, a fundamental aspect of medical oncology, involves usage of one or more anti-cancer drugs in a standard regimen to treat cancer¹. It targets rapidly dividing cancer cells, damaging their DNA and preventing replication, which leads to cell death^2,3. Unlike antibiotics, chemotherapy drugs integrate with DNA, inhibiting RNA synthesis and breaking down DNA strands⁴. The ongoing development of predictive models is crucial for discovering effective therapeutic drugs. There are several major problems with current chemotherapy therapies, making the development of new medications an absolute necessity. Severe side effects, including harm to healthy cells, are common with current chemotherapy medications and might manifest as nausea, exhaustion, and hair loss⁵. In addition, the effectiveness of treatment decreases with time since some tumours develop resistance to current medications⁶. Furthermore, due to the heterogeneity of cancer kinds, it is necessary to develop medicines that can zero in on certain biochemical pathways⁷. Therefore, there is a constant demand for novel therapeutic approaches to enhance treatment effectiveness and patient well-being while reducing side effects. In light of these obstacles, it is critical to seek novel chemotherapeutic medications to improve cancer treatment results^8,9.

Chemical graph theory is a branch of mathematical chemistry that models chemical structures using graphical techniques¹⁰. It involves representing molecules as graphs, where atoms and bonds represent vertices and edges, respectively. This method facilitates the examination of molecular structure, connectivity, and characteristics¹¹. In chemical graph theory, topological indices play a crucial role as numerical descriptors that include the structural aspects of molecules in graph-based representations¹². Predicting a wide range of physical, chemical, and biological features is made possible by these indices, which store structural and topological information about molecules^{13,14,15,16,17}. Researchers frequently use topological indices to analyze the effects of molecular characteristics on experimental outcomes¹⁸. This paper is significant for medicinal chemistry and computational drug design researchers, as it provides a detailed analysis of how topological indices can enhance the drug discovery process¹⁹. The study explores the process by which QSPR models can use chemical structure-derived topological indices to forecast octane isomer characteristics²⁰. In their study, Öztürk et al.²¹ determined topological indices for commutative ring prime ideal sum graphs. In order to forecast the physicochemical properties of cardiovascular medications, Arockiaraj et al.²² employed modified reverse degree topological indices. For the intricate structure of ruthenium bipyridine, Abirami²³ calculated the degree-based topological indices. When estimating the boiling point of benzenoid hydrocarbons, the first hyper-Zagreb index comes out as best estimated²⁴. Quantitative structure-activity relationships (QSAR) and quantitative structure-property relationships (QSPR) employ topological indices generated from chemical graphs to predict the biological activity or qualities of molecules²⁵.

The prediction of the properties of a chemical compound can be made through investigations of quantitative structure-property relationships (QSPR) based on their molecular structure. Previous studies represent several types of research, regarding the importance of the QSPR model in predicting drug activity. Topological index and other mathematical methods are used to measure the behavior of organic compounds, including biological activities, has become a major subject in various fields of study. They compared the performance of the QSPR model with previous studies and found that the model using topological parameters was more accurate in predicting chemical compounds. These strategies involve developing mathematical models that correlate the structural attributes of molecules including drug structures with their physicochemical properties²⁶. Integrating diverse topological indices and advanced modelling methods, such as multigraph representations and linear regression, has enhanced our understanding of how molecular structure relates to drug properties. The potential of QSPR to further the discovery and optimization of new drug candidates, as well as the development of safer and more effective therapeutic agents, is a key factor in this advancement.

For instance, Parveen et al.²⁷ explored the importance of using degree based topological indices in QSPR models to predict fungicide properties. They used several topological descriptors and correlated them with the physicochemical properties of the fungicides. To forecast melting point of 349 organic molecules, Tahir et al.²⁸ created a QSPR analysis using molecular descriptors based on linear regression modelling. Wei et al.²⁹ studied QSPR through various topological indices on a variety of medication chemical structures. The author used these techniques to predict how chemical structures relate to their properties. Ahmed et al.³⁰ predicted the physical and biological properties of novel Alzheimer’s disease through QSPR analysis based on linear regression modeling. Topological indices were used in the QSPR study of several significant COVID-19 medicines by Ozturk et al.³¹. The author investigated the structure of several drug candidates for COVID-19 through topological coincides and compared them with their physicochemical properties. Zaman et al. used degree-based topological discriptors and linear regression modeling to investigate new medications for treatments of blood cancer³². To evaluate the physicochemical features and anti-hepatitis medications, Mahboob et al.³³ used QSPR analysis in conjunction with linear regression models. Zhang et al. utilized QSPR analysis with topological indices to study drugs for schizophrenia treatment³⁴. This approach facilitates the selection process by enabling researchers to evaluate drug candidates based on specific criteria, leading to more efficient and informed decision-making.

Density Functional Theory (DFT) gives accurate information about the electronic structure, which makes the descriptors used in QSPR models more reliable. Thomas³⁵ and Fermi³⁶ presented the inaugural density functional approximation. The simplifications of the Hartree–Fock^37,38 approach by Slater³⁹ have facilitated realistic DFT computations. A QSPR model for several chlorine-substituted biphenyl systems is presented in this paper using linear and multi-linear regression analysis. The model uses conceptual DFT and information-theoretic descriptors to predict the lipophilicity (log KOW) of polychlorobiphenyl congeners⁴⁰. The author investigated the stability and electrical characteristics of the PVP-CaCO3 blends using DFT computations. To further track the responsiveness of the suggested PVP-CaCO3 interactions, QSAR computations were conducted⁴¹. Using DFT analysis and QSPR modelling, the work delves into the control and prediction of defect-related characteristics in ZnO nanosheets⁴². By utilizing topological indices derived from the molecular structure, the study demonstrated an improvement in the accuracy of predictive curvilinear models for fibrates drug activity⁴³. Another relevant study applied these techniques to monocarboxylic acids, examining the thermodynamic properties and their relationship with temperature-based topological indices⁴⁴. These studies collectively suggest that the integration of topological indices with curvilinear regression modelling and DFT calculations can significantly enhanced prediction of drug activity, providing a robust framework for computational drug discovery and cheminformatics⁴⁵.

Materials and methods

Here, we examine the efficacy of chemotherapy medications using curvilinear regression models that are based on molecular characteristics. We investigated distance-based topological indices, such as Weiner index⁴⁶, Schultz index⁴⁷, Harary index⁴⁸, Additive Harary index⁴⁹, Multiplicative Harary index⁵⁰ and the Guttmann index⁵¹. These indices evaluate the performance of curvilinear regression models for predicting the action of chemotherapeutic medicines and for classifying molecular descriptors. All distance based topological indices with their respective mathematical description are shown in Table 1.

Table 1 Distance-based topological discriptors:

Full size table

The drugs used in this study, along with their DrugBank IDs, are: Gemcitabine(DB00441), Cytrabine(DB00987), Fludarabine(DB01073), and Capecitabine(DB01101) are used in the treatment of various cancers. Treatments for lymphomas, acute lymphocytic leukemia (ALL), and acute myeloid leukemia (AML) sometimes involve the use of fludarabine and cyclobine⁵². Capecitabine is a chemotherapy drug used to treat gastric cancer, breast cancer and colorectal cancer⁵³. Gemcitabine is chemotherapy medication very effective in Pancreatic Cancer treatments⁵⁴. The research indicated that Clofarabine(DB00631), in conjunction with other chemotherapeutic drugs, demonstrated a markedly beneficial effect in specific prognostic risk groupings of AML patients⁵⁵. Altretamine(DB00488) is less hazardous than other medications employed in the treatment of refractory ovarian cancer⁵⁶. Dacarbazine(DB00851) is a chemotherapeutic agent employed in the management of melanoma and Hodgkin’s lymphoma⁵⁷. These drugs interfere with an enzyme called liposomal, which helps to resolve DNA during replication. By inhibiting these enzymes, topoisomerase inhibitors cause DNA damage and cell death drugs slow mitosis (cell division) by distorting microtubules, the main components of cell structure⁵⁸.

Predicting many different thermodynamical features has become easier with that QSPR modelling. For these forecasts, the QSPR models makes use of topological indices. All of the following properties are taken into account: Dipole moment (DM), Zero-point vibrational energy (ZpVE), Molar entropy (S), Complexity (C), Polarisability (P), Heat capacity (CV), Topological polar surface area (TPA), Sum of electronic and zero-point energies (SEZpE), thermal (E), and Octanol-water partition (xLogp3) coefficients. To examine the connections between these attributes and the topological indices, curvilinear regression (quadratic, cubic, and linear) is used. The statistical capabilities of SPSS and MATLAB are utilised for the model to produce statistical parameters.

The DFT calculations were carried out using Material Studio⁵⁹ version 8.0 from BIOVIA⁶⁰ based on DMol3⁶¹ optimised geometries for chemotherapy. The findings were utilised to learn more about the electron density mapped with electrostatic potential (ESPMs), density of state (DOS) plots, optimum geometries, and the energies of the lowest unoccupied and highest occupied molecular orbitals (HOMO and LUMO, respectively). Using the B3LYP (Becke, 3-parameter, Lee-Yang-Parr) hybrid functional, the DFT calculations of the drug derivatives under study were performed. In density-functional theory (DFT), hybrid functionals that incorporate additional empirical/ab initio exchange together with some Hartree-Fock exchange are used to approximate the exchange-correlation energy. The choice of the B3LYP functional with the 6-31G(d, p) basis set is likely the preference of most computational chemistry studies because it best balances accuracy and computation time. The B3LYP functional itself is basically a hybrid density functional, mixing Hartree-Fock exchange with DFT; this feature has made it very applicable to a wide class of systems. The 6-31G(d, p) basis set, which itself is also relatively small and includes the polarization of non-hydrogen atoms, is popular because it offers a fairly good balance between cost and accuracy⁶². A basis set [6 − 31G(d, p)] was combined with the presentation of the Hamiltonian component in the Schrödinger equation to represent the eigenvalue wave function. Two polarization basis functions are used to amplify the set: one for all hydrogen atoms and one for heavy atoms (carbon, oxygen, nitrogen, and fluorine). The double zeta (G) basis set is modest. The majority of the thermodynamical parameters of the tested chemotherapeutic derivatives, which will be described in detail in the next section, were obtained from frequency calculations carried out at the same level of optimization theory. Optimal frequency and thermodynamical calculations were performed using the software package developed by Gaussian⁶³ in 2009. The optimised structures were visualised using GaussView (version 5.0.8)⁶⁴, The Avogadro package⁶⁵ was used to draw ESPMs, and GaussSumprogram⁶⁶ was used to make DOS charts.

To evaluate the efficacy of chemotherapeutic medications, this part aims to build a quantitative structure-property (QSPR) link between topological indices and certain thermodynamical qualities and activities. The antiviral activity was modelled using six distance topological indicators. Investigated chemotherapeutic medicines based on geometries optimised by DMol3. The DFT calculations were carried out using BIOVIA’s Material Studio 8.0. The following quantities were calculated: molar entropy (S), electrical and zero-point energies (SEZpE), zero-point vibrational energy (ZPVE), thermal energy (E), dipole moment (DM) and heat capacity (CV). The properties with are extracted from ChemSpider⁶⁷ and PubChem⁵⁵ are as follows: polarisability (P), complexity (C), topological polar area (TPA), and octanol-water partition coefficients (XlogP3). Cancer treatments including Gemcitabine, Cytarabine, Fludarabine, and Capecitabine were the subjects of these computations. Curvilinear regression analysis, which may be performed using SPSS statistical software, allows for curves to be fitted instead of straight lines. Curvilinear regression models use topological indices as their independent variables, as will be shown below. Evaluations are carried out according to the formulas provided below.

$$\:y\:=\:a\:+\:bx,\:\:\:\:\:\:\:\:\:\:\text{n},\:{R}^{2},\:\text{F},\text{S}\text{e},\text{S}\text{F}\:\:\:\:\:\:\:\left(Linear\:regression\right)$$

$$\:y\:=\:a\:+{b}_{1}x\:+{b}_{2}{x}^{2},\:\:\text{n},\:{R}^{2},\:\text{F},\text{S}\text{e},\text{S}\text{F}\:\:\left(Qudratic\:regression\right)$$

$$\:y\:=\:a\:+{b}_{1}x\:+{b}_{2}{x}^{2}+{b}_{3}{x}^{3},\:\:\text{n},\:{R}^{2},\:\text{F},\text{S}\text{e},\text{S}\text{F}\:\:\left(\text{C}\text{u}\text{b}\text{i}\text{c}\:\text{r}\text{e}\text{g}\text{r}\text{e}\text{s}\text{s}\text{i}\text{o}\text{n}\right)$$

In the given equation, $\:y$ stands for the thermodynamical attribute, $\:x$ for the independent variable, $\:a$ and $\:{b}_{i}\left(i=\text{1,2},3\right)$ for the regression constants, and n for the sample number. The coefficient of correlation, abbreviated as R SE stands for Fisher’s statistics, and F for Fisher’s estimate. To forecast the model’s efficiency and compare the outcomes with experimental data, the Root Mean Square Error (RMSE) is calculated. The optimal predictive model is defined as the one with the lowest root-mean-squared error (RMSE), which is determined to be:

$$\:RMSE=\sqrt{\frac{{\sum\:}_{I=1}^{n}{\left({y}_{i}-\widehat{{y}_{i}}\right)}^{2}}{n}}$$

While $\:n$ is the sample number, the independent variable $\:{y}_{i}$ represents the observed values and $\:\widehat{{y}_{i}}$ represents the predicted values. When evaluating the precision of a regression model or other predictive model, the Root Mean Square Error (RMSE) becomes crucial. One way to measure the accuracy of a prediction system is by looking at its root-mean-squared error (RMSE). It provides an idea of the degree to which the model’s forecasts agree with the observed data. A lower RMSE indicates that the model is more accurate. Researchers can use root-mean-squared error (RMSE) to validate models, pick the best model for their data, and understand how well the models fit. By analysing the prediction errors, root-mean-squared error (RMSE) reveals potential model improvement areas. In a nutshell, it gives a brief synopsis of the process for making accurate predictions, which in turn helps with comparison and, ultimately, with understanding the model fit and how to improve it.

Results and discussion

Electronic structural properties computation

The key features of the seven chemotherapy derivatives that were studied are shown in Figs. 1, 2, 3, 4, 5, 6 and 7. These features include optimized geometries, total density of states (DOS) plots, optimized electron density mapping with electrostatic potential (ESPM), and special distributions of the highest occupied molecular orbitals (HOMOs) and lowest unoccupied molecular orbitals (LUMOs). ESPMs reveal the distribution of electron densities in the seven non-planar molecules with respect to the electrostatic potentials, which in turn reveal which parts of the molecules are most vulnerable to attack from electrophilic or nucleophilic substances, depending on their electron density. Colors blue (representing a positively charged region) and red (representing a negatively charged region) indicate nucleophilic and electrophilic attack zones, respectively. The atoms that are most strongly electronegative, like Oxygen, Nitrogen, and Fluorine, are shaded a deep red; the atoms that are least strongly electronegative, Hydrogen, are shaded blue; and the atoms that are most neutral, like Carbon, are shaded white, indicating that their electronegativity is intermediate. As a result, ESPMs can pinpoint exactly where an electrophile or nucleophile is attacking a molecule. The number of possible energy levels that electrons in a system can occupy is shown by the molecule’s DOS plot. We looked at the seven HOMO energies. Destabilized HOMO, which is less negative, increases a molecule’s capacity to donate electrons. The seven derivatives’ electron-donation abilities can be organized in the following way: Dacarbazine(− 5.3204eV)$\:>$Altretamine(− 5.5492eV)$\:>$Flurabine (− 6.08065eV)$\:>$Cytarabine(− 6.25372eV)$\:\:>$Clofarabine(− 6.2667eV)$\:>$ Gemcitabine(− 6.44284eV)$\:\:>$Capcitabine(− 6.513864 eV). Conversely, LUMO energy quantifies the electrons’ capacity to take up a molecule; the greater the electrons’ capacity, the more negative the LUMO. Consequently, the derivative’s electron-accepting capacity is Capcitabine (-1.67567eV) $\:>$Gemcitabine(− 1.131449eV) $\:>$ Dacarbazine(− 0.9638eV)$\:\:>$Clofarabine(− 0.896eV)$\:>$Cytarabine(− 0.83049eV)$\:>$Flurabine(0.59375eV)$\:\:>$Altretamine(0.8272 eV). The HOMO and LUMO orbital energies do not distribute in the same way in Capcitabine. The molecule exhibits greater LUMO than HOMO properties. In structural terms, this indicates that the LUMO locations are more noticeable and important than the HOMO locations. A molecule’s HOMO and LUMO energies are defined by the energy gap between them, which is abbreviated as ∆E. It’s an important metric that can reveal a lot about the molecule. Chemical reactivity is measured by the energy gap, which is the difference between the HOMO and LUMO energies. A molecule with a narrower gap is more reactive; the seven derivatives’ reactivity is: Altretamine(6.3764eV)$\:>$Flurabine(5.48690eV)$\:\:>$Cytarabine(5.42323eV)$\:\:>$Clofarabine(5.371eV)$\:>$Gemcitabine(5.31139eV)$\:\:>$Capcitabine(4.8381eV)$\:>$Dacarbazine (4.356 eV).

Due to its reduced energy gap, Dacarbazine has more reactivity than other oral chemotherapy drugs. Dacarbazine’s enhanced reactivity facilitates its involvement in chemical processes, potentially increasing its efficacy and susceptibility to interactions with other drugs. Conversely, Alteramine, with a greater energy gap, exhibits enhanced stability. The increased stability indicates that Alteramine is less prone to react with other substances, rendering it more dependable and constant in its physiological activity. Fundamentally, Dacarbazine has greater chemical reactivity, whereas Alteramine demonstrates enhanced chemical stability. An additional indicator of the regions vulnerable to electrophilic and nucleophilic attack is the two-dimensional distribution of HOMO and LUMO orbitals. The thermodynamical properties shown in Table 2, are extracted from Chemspider, PubChem and DFT.

Curvilinear regression modeling

Several topological indices are used to forecast chemotherapy medicines. Table 3 shows the values of the calculated topological indices. The QSPR approach examines regression models that are cubic, linear, and quadratic. The models are assessed in relation to six distance-based topological metrics. A correlation coefficient (R) between these indices and certain thermodynamical parameters as established by linear, quadratic, and cubic regression models is shown in Table 4. The regression model with the highest R-value is regarded as the most accurate when the correlation coefficient for a thermodynamical parameter is determined. Each of these tables shows the greatest value (R) for a different thermodynamical feature determined by the data analysis: linear, quadratic, and cubic. The values below 0.64 have been removed from the Tables out of convenience.

Table 2 Thermodynamical properties of chemotherapy drugs.

Full size table

Table 3 Distance-based topological indices of chemotherapy drugs.

Full size table

Table 4 Linear, quadratic and cubic correlation coefficient (R).

Full size table

Table 5 shows the optimal topological index for evaluating thermodynamical attributes with linear regression models. Figure 5 illustrates a graphic to this.

Table 5 Best estimation of linear regression models.

Full size table

The best topological index for evaluating thermodynamical attributes using quadratic regression models is shown in Table 6. Figure 9 shows a graphic that illustrates this.

Table 6 Best estimation of quadratic regression models.

Full size table

The most efficient topological index for thermodynamical property estimation with cubic regression models is shown in Table 7. Figure 10 illustrates this with a diagram.

Table 7 Best estimation of cubic regression models.

Full size table

Using linear, quadratic, and cubic curvilinear models, the thermodynamic properties of chemotherapeutic drugs and the associated distance-based topological indices were investigated. Finding the correlation coefficient that most accurately reflected the attributes under investigation was the main purpose. The correlation coefficient R for the six distance topological indices based on the cubic, linear, and quadratic curvilinear models is displayed in Table 4. According to the specified topological indices, the best thermodynamical property predictions are shown in Tables 5 and 6, and 7 using linear, quadratic, and cubic regression models. For the regression models, you may have noticed that we bolded the correlation coefficient. The best quadratic and linear models of distance-based topological indices and their attributes are shown in Figs. 8 and 9, and 10, respectively. According to the findings, the association coefficients varied among the various topological indices and thermodynamical characteristics. When two variables are positively correlated, it means they have a strong tendency to move in the same direction. When they are negatively correlated, it means the reverse is true. A linear regression model’s independent variables are considered to have adequately explained the variance in the dependent variable when R², also known as the coefficient of determination, is calculated. A perfect fit is indicated by a value of 1, varying from 0 to 1. Root mean squared error (RMSE) quantifies the degree to which the values predicted by the regression model correspond to the actual values. The properties which have the highest regression concerning topological indices are highlighted in bold. In Table 4, Every property has a corresponding topological index which predicts is best. In Tables 5, 6 and 7 we will conclude the best regression prediction according to the least RMSE value. Table 5 indicates that Heat Capacity shows the highest R² = 0.979 with the least RMSE = 1.955, concerning the Gutman index with linear regression model: $\:CV=.007Gut\left(G\right)+43.628$. Dipole moment and Sum of electronic and zero point energies are best predicted by multiplicative Harary index, with R² = 0.658, 0.833 and least RMSE=1.340,113.2 respectively. Polarity, Zero-point vibrational energy, Enthalpy and Molar entropy are highly anticipated by the Wiener index with R² = 0.779, 0.710, 0.734,0.972 and RMSE=2.026, 20.23,20.37, 20.37. Complexity is best predicted with the Harrary index with R² = 0.979 and RMSE=63.35. Table 6 shows that Heat Capacity shows the highest R² = 0.990 with the least RMSE=1.363, concerning the Wiener index with quadratic regression model:

.The Gutman index best predicted the dipole moment and Sum of electronic and zero point energies, with R² = 0.693, 0.853 and least RMSE = 1.268,106.1 respectively. Polarity, Zero-point vibrational energy, Enthalpy, Molar entropy and Complexity are highly anticipated by the Wiener index with R² = 0.783, 0.768, 0.786, 0.976, 0.781 and RMSE = 2.011, 18.07, 18.24, 2.847, 61.25 respectively. Table 7 shows that Heat Capacity shows the highest R² = 1 with the least RMSE = 0.203, concerning the Wiener index with cubic regression model:

. Dipole moment, Polarity, Zero-point vibrational energy and Enthalpy are highly predicted by the multiplicated Harary index with R² = 0.740, 0.918, 0.985, 0.988 and RMSE = 1.167, 1.236, 4.604, 4.339. The Harary index with R² = 0.984 and RMSE = 2.337 best predicts molar entropy. Complexity is highly estimated by the Gutman index with R² = 0.883 and RMSE = 44.82. Sum of electronic and zero point energies (SEZ_pE) and heat capacity are best estimated by the Schultz index with R² = 0.860, 1 and RMSE = 103.6, 0.296 respectively. The sum of electronic and zero point energies (SEZ_pE) is also best predicted with the Wiener index with R² = 0.860 and RMSE = 103.6. From all the above results it is concluded that the Wiener index is the best predictor and Heat Capacity is the best predicted property.

Model validation and evaluation

Heat capacity is considered the most accurately predicted property in QSPR modelling of chemotherapy drugs. Therefore, its actual values are compared with those predicted by linear, quadratic, and cubic models in Table 8, all of which show near-identical results with minimal errors. Table 9 depicts actual and predicted values of molar entropy. Both tables emphasize the accuracy of quadratic regression model.

Table 8 Actual and predicted values for heat capacity (CV).

Full size table

Table 9 Actual and predicted values for molar entropy(S).

Full size table

Conclusion

Our thorough investigation has shown that curvilinear regression models greatly improve the study of chemo drug action using molecular descriptors. We found that these models outperform linear regression models in terms of predictive power, particularly in cases when the underlying data shows nonlinear correlations. The models’ accuracy and resilience have been significantly enhanced by adding molecular descriptors as independent variables. The area of drug research and development stands to benefit greatly from our results. One way to cut down on development time and expenses is to utilize curvilinear regression models with molecular descriptors; this will help find and optimize medications that are more effective and selective. Additionally, unlike usual linear regression methods, our research highlights the significance of taking nonlinear connections between molecular descriptors and pharmacological action into account. Finally, future research might reveal how additional kinds of drugs can benefit from using curvilinear regression models and topological descriptors to forecast drug activity. To summarize, our research shows that curvilinear regression models are a strong tool for pharmacological activity analysis, especially when combined with molecular descriptors. The above findings are likely to be valid in bringing forth the potential improvement of chemo drug studies among curvilinear models. Though the study had acknowledged some limitations earlier, putting more weight on the potential for these models seemed appropriate- at least strong results were promised, rather than definitive claims. A more cautious conclusion probably would have underplayed potential impact, but results of this study may demand some optimism about curvilinear models, with future validation acknowledged. Our findings provide light on the molecular processes controlling drug action and lay the groundwork for better drug discovery pipelines.

Data availability

All data generated is available within the manuscript.

References

Johnstone, R. W., Ruefli, A. A. & Lowe, S. W. Apoptosis: A link between cancer genetics and chemotherapy. Cell 108(2), 153–164 (2002).
Article PubMed CAS Google Scholar
Sharma, A., Jasrotia, S. & Kumar, A. Effects of chemotherapy on the immune system: implications for cancer treatment and patient outcomes. Naunyn-Schmiedeberg’s Arch. Pharmacol. 397, 2551–2566 (2024).
Article CAS Google Scholar
Nayak, S. G. et al. Mechanisms of action of alkylating agents and their role in cancer therapy. J. Cancer Res. Ther. 8(3), 287–296 (2012).
Google Scholar
Lodish, H. F. Molecular Cell Biology (Macmillan, 2008).
Kim, S. & Bolton, E. E. PubChem: A large-scale public chemical database for drug discovery. Databases Datasets Drug Discov., 39–66 (2024).
Brianna & Lee, S. H. Chemotherapy: How to reduce its adverse effects while maintaining the potency? Med. Oncol. 40(3), 88 (2023).
Article PubMed CAS Google Scholar
Rudolph, J., Settleman, J. & Malek, S. Emerging trends in cancer drug discovery—from drugging the undruggable to overcoming resistance. Cancer Discov. 11(4), 815–821 (2021).
Article PubMed CAS Google Scholar
Brooks, K. L. Why New cancer Treatment Discoveries Are Proliferating (Penn Medicine Magazine, 2023).
Koper, K., Wileński, S. & Koper, A. Advancements in cancer chemotherapy. Phys. Sci. Reviews 8(4), 583–604 (2023).
Google Scholar
Trinajstic, N. Chemical Graph Theory (CRC, 2018).
Gutman, I., Furtula, B. & Ghorbani, M. Topological indices in chemical graph theory. Math. Chem. Monogr. 17, 1–382 (2017).
Google Scholar
Yao, Y., He, J., Yang, K. & Zhao, J. Representation learning of molecular graphs with recurrent substructure pooling. J. Chem. Inf. Model. 60(12), 5735–5745 (2020).
Google Scholar
Klebe, G. Recent developments in structure-based drug design. J. Mol. Med. 78, 269–281 (2000).
Article PubMed CAS Google Scholar
Oboudi, M. R. On graphs with integer Sombor index. J. Appl. Math. Comput. 69(1), 941–952 (2023).
Article MathSciNet Google Scholar
Öztürk Sözen, E., Eryaşar, E. & Çakmak, Ş. Szeged-like topological descriptors and COM-polynomials for graphs of some Alzheimer’s agents. Mol. Phys. e2305853 (2024).
Das, K. C., Çevik, A. S., Cangul, I. N. & Shang, Y. On Sombor index. Symmetry 13(1), 140 (2021).
Article ADS Google Scholar
Ediz, S., Çiftçi, I., Cancan, M. & Farahani, M. R. On k-total distance degrees and k-total wiener polarity index. J. Inform. Optim. Sci. 42(7), 1469–1477 (2021).
Google Scholar
Gonzalez-Diaz, H., Vilar, S., Santana, L. & Uriarte, E. Medicinal chemistry and bioinformatics-current trends in drugs 550 discovery with networks topological indices. Curr. Top. Med. Chem. 7(10), 1015–1029 (2007).
Article PubMed CAS Google Scholar
Zanni, R., Galvez-Llompart, M., Garcia-Domenech, R. & Galvez, J. Latest advances in molecular topology applications for drug discovery. Expert Opin. Drug Discov. 10(9), 945–957 (2015).
Article PubMed CAS Google Scholar
Ravi, V. & Desikan, K. Quantitative Structure-Property Relationship (QSPR) Analysis of some Closed Neighborhood Degree Based Topological Indices for Octane Isomers (Authorea Preprints, 2024).
Öztürk Sözen, E., Alsuraiheed, T., Abdioğlu, C. & Ali, S. Computing topological descriptors of prime ideal sum graphs of commutative rings. Symmetry 15(12), 2133 (2023).
Article ADS Google Scholar
Arockiaraj, M., Greeni, A. B., Kalaam, A. A., Aziz, T. & Alharbi, M. Mathematical modeling for prediction of physicochemical characteristics of cardiovascular drugs via modified reverse degree topological indices. Eur. Phys. J. E. 47(8), 53 (2024).
Article PubMed CAS Google Scholar
Abirami, S. J., Raj, S. A. K., Siddiqui, M. K. & Zia, T. J. Computation of degree-based topological indices for the complex structure of ruthenium bipyridine. Int. J. Quantum Chem., 124(1), e27310 (2024).
Rajasekharaiah, G. V. & Murthy, U. P. Hyper-Zagreb indices of graphs and its applications. J. Algebra Combinatorics Discrete Struct. Appl. 8(1), 9–22 (2020).
Article MathSciNet Google Scholar
Diudea, M. V. & Diudea, M. V. Basic chemical graph theory. Multi-shell Polyhedral Clusters 1–21. (2018).
Golbraikh, A., Wang, X. S., Zhu, H. & Tropsha, A. Predictive QSAR modeling: Methods and applications in drug discovery and chemical risk assessment. Handbook Comput. Chemistry, 1309–1342. (2012).
Parveen, S. et al. QSPR modeling of fungicides using topological descriptors. Int. J. Anal. Chem. 2023(1), 9625588 (2023).
Tahir, I., Wijaya, K., Yahya, M. U. & Yapin, M. Quantitative relationships between molecular structure and melting point of several organic compounds. Indonesian J. Chem. 2(2), 83–90 (2002).
Article Google Scholar
Wei, J., Hanif, M. F., Mahmood, H., Siddiqui, M. K. & Hussain, M. QSPR analysis of diverse drugs using linear regression for predicting physical properties. Polycycl. Aromat. Compd. 44(7), 4850–4870 (2024).
Article CAS Google Scholar
Ahmed, W., Ali, K., Zaman, S. & Raza, A. Molecular insights into anti-Alzheimer’s drugs through predictive modeling using linear regression and QSPR analysis. Mod. Phys. Lett. B 38(27), 2450260 (2024).
Article ADS CAS Google Scholar
Öztürk Sözen, E. & Eryaşar, E. QSPR analysis of some drug candidates investigated for COVID-19 via new topological coindices. Polycycl. Aromat. Compd. 44(2), 1291–1308 (2024).
Article Google Scholar
Zaman, S., Yaqoob, H. S. A., Ullah, A. & Sheikh, M. QSPR analysis of some novel drugs used in blood cancer treatment via degree based topological indices and regression models. Polycycl. Aromat. Compd. 44(4), 2458–2247 (2024).
Article CAS Google Scholar
Mahboob, A., Rasheed, M. W., Dhiaa, A. M., Hanif, I. & Amin, L. On quantitative structure-property relationship (QSPR) analysis of physicochemical properties and anti-hepatitis prescription drugs using a linear regression model. Heliyon 10(4) (2024).
Zhang, X. et al. QSPR analysis of drugs for treatment of schizophrenia using topological indices. ACS Omega 8(44), 41417–41426 (2023).
Article PubMed PubMed Central CAS Google Scholar
Thomas, L. H. The calculation of atomic fields. Proc. Camb. Philos. Soc. 23, 542–548 (1927).
Article ADS CAS Google Scholar
Fermi, E. A statistical method for determining some properties of the atoms and its application to the theory of the periodic table of elements. Z. Angew Phys. 48, 73–79 (1928).
CAS Google Scholar
Hartree, D. R., Hartree, F. R. S. & Hartree, W. Self-consistent field, with exchange, for beryllium. Proc. R Soc. Lond. Ser. Math. Phys. Sci. 150, 0009–0033 (1935).
ADS CAS Google Scholar
Fock, V. Approximation method for the solution of the quantum mechanical multibody problems. Z. Angew Phys. 61, 126–148 (1930).
Google Scholar
Slater, J. C. A simplification of the Hartree-Fock method. Phys. Rev. 81, 385–390 (1951).
Article ADS CAS Google Scholar
Poddar, A., Pal, R., Rong, C. & Chattaraj, P. K. A conceptual DFT and information-theoretic approach towards QSPR modeling in polychlorobiphenyls. J. Math. Chem. 61(5), 1143–1164 (2023).
Article MathSciNet CAS Google Scholar
Refaat, A. & Ibrahim, M. Microspectroscopic, DFT and QSAR study of PVP/CaCO3 blends as potential bone-remineralization membranes. Egypt. J. Chem. 67(2), 29–41 (2024).
Google Scholar
Kochnev, N. D. et al. Regulation and prediction of defect-related properties in ZnO nanosheets: Synthesis, morphological and structural parameters, DFT study and QSPR modelling. Appl. Surf. Sci. 621, 156828 (2023).
Article CAS Google Scholar
Wazzan, S., Ozalan, N. U. & Symmetry Exploring the symmetry of curvilinear regression models for enhancing the analysis of fibrates drug activity through molecular descriptors. 15(6), 1160 (2023).
Nagesh, H. M. QSPR Analysis with Curvilinear Regression Modeling and Temperature-based Topological Indices. arXiv preprint arXiv:2404.08650. (2024).
Wazzan, S. & Ozalan, N. U. Graph energy variants and topological indices in platinum anticancer drug design: Mathematical insights and computational analysis with DFT and QTAIM. J. Math. 2023(1), 5931820 (2023).
MathSciNet Google Scholar
Wiener, H. Structural determination of paraffin boiling points. J. Am. Chem. Soc. 69(1), 17–20 (1947).
Article ADS PubMed CAS Google Scholar
Schultz, H. P. Topological organic chemistry. 1. Graph theory and topological indices of alkanes. J. Chem. Inf. Comput. Sci. 29(3), 227–228 (1989).
Article CAS Google Scholar
Plavšić, D., Nikolić, S., Trinajstić, N. & Mihalić, Z. On the Harary index for the characterization of chemical graphs. J. Math. Chem. 12, 235–250 (1993).
Article MathSciNet Google Scholar
Khosravi, B. & Ramezani, E. On the additively weighted Harary index of some composite graphs. Mathematics 5(1), 16 (2017).
Article Google Scholar
Ana, M. & Xiong, L. Multiplicatively weighted Harary index of some composite graphs. Filomat 29(4), 795–805 (2015).
Article MathSciNet Google Scholar
Gutman, I. Selected properties of the Schultz molecular topological index. J. Chem. Inf. Comput. Sci. 34, 1087–1089 (1994).
Article ADS CAS Google Scholar
Pigneux, A., Perreau, V., Jourdan, E., Vey, N., Dastugue, N., Huguet, F., & Reiffers, J. Adding lomustine to idarubicin and cytarabine for induction chemotherapy in older patients with acute myeloid leukemia: the BGMT 95 trial results. Haematologica 92(10), 1327–1334 (2007).
Joint Formulary Committee in British National Formulary Vol. 64 (2012).
Gemcitabine in Combination. With a second cytotoxic agent in the First-Line treatment of locally advanced or metastatic pancreatic cancer: a systematic review and Meta-Analysis. Targeted Oncol. 12(3), 309–321 .
Hanekamp, D. et al. Early assessment of Clofarabine effectiveness based on measurable residual disease, including AML stem cells. Blood J. Am. Soc. Hematol. 137(12), 1694–1697 (2021).
CAS Google Scholar
Malik, I. A. Altretamine is an effective palliative therapy of patients with recurrent epithelial ovarian cancer. Jpn J. Clin. Oncol. 31(2), 69–73 (2001).
Article PubMed CAS Google Scholar
Sonker, P. et al. A study on cancer and its drugs with their molecular structure and mechanism of action: A review. World J. Pharm. Sci., 13–34 (2018).
Jordan, M. A. & Wilson, L. Microtubules as a target for anticancer drugs. Nat. Rev. Cancer 4(4), 253–265 (2004).
Article PubMed CAS Google Scholar
Meunier, M. Guest editorial: Materials studio. Mol. Simul. 34 (10–15), 887–888 (2008).
Article CAS Google Scholar
https://www.3ds.com/products/biovia/reference-center
Delley, B. DMol3 DFT studies: From molecules and molecular environments to surfaces and solids. Comput. Mater Sci. 17(2–4), 122–126 (2000).
Article CAS Google Scholar
De Sousa Sousa, N., Silva, A. L. P. & Silva, A. C. A. DFT analysis of structural, energetic and electronic properties of doped, encapsulated, and decorated first-Row transition metals on B12N12 nanocage: Part 1. J. Inorg. Organomet. Polym Mater. 34(9), 4082–4099 (2024).
Frisch, A. Gaussian 09 W Reference 25 (Wallingford, 2009).
Dennington, R., Keith, T. & Millam, J. GaussView. S. Mission (Semichem Inc., 2009).
Hanwell, M. D. et al. Avogadro: an advanced semantic chemical editor, visualization, and analysis platform. J. Cheminform. 4(1), 17 (2012).
Article PubMed PubMed Central CAS Google Scholar
O’boyle, N. M., Tenderholt, A. L. & Langner, K. M. Cclib: a library for package-independent computational chemistry algorithms. J. Comput. Chem. 29(5), 839–845 (2008).
Article ADS PubMed Google Scholar
Pence, H. E. & Williams, A. ChemSpider: An online chemical information resource. (2010).

Download references

Funding

There is no funding available for this work.

Author information

Authors and Affiliations

Department of Mathematics, Government College University Faisalabad, Faisalabad, 38000, Pakistan
Fatima Saeed, Nazeran Idrees & Muhammad Imran

Authors

Fatima Saeed
View author publications
Search author on:PubMed Google Scholar
Nazeran Idrees
View author publications
Search author on:PubMed Google Scholar
Muhammad Imran
View author publications
Search author on:PubMed Google Scholar

Contributions

It is confirmed that all listed authors made significant contributions, with roles: F.S.: Conceptualisation, methodology, software computation writing original manuscript and Formal Analysis. N.I.: Supervision, methodology, Review and editing. M.I.: Software computation, validation.

Corresponding author

Correspondence to Nazeran Idrees.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Saeed, F., Idrees, N. & Imran, M. DFT based structural modeling of chemotherapy drugs via topological indices and curvilinear regression. Sci Rep 15, 33755 (2025). https://doi.org/10.1038/s41598-025-97982-5

Download citation

Received: 02 January 2025
Accepted: 08 April 2025
Published: 30 September 2025
Version of record: 30 September 2025
DOI: https://doi.org/10.1038/s41598-025-97982-5