Introduction

Polyphenols are a diverse group of natural compounds found in plants. They are well-known for having antioxidant qualities that help shield the body from damaging free radicals. Fruits, vegetables, tea, coffee, cocoa, and certain spices are some of the known sources of polyphenols. However, it is important to note that the specific health effects of polyphenols depend on its type and concentration present in different foods. Research suggests that polyphenols have several health benefits. They can reduce the risk of various chronic diseases like heart disease, certain cancers, and neurodegenerative disorders. Polyphenols are secondary bioactive naturally occurring chemicals produced by plants. They have a broad spectrum of bioactivities that support health promotion1,2. Polyphenols can be described as phenolic rings connected to various functional groups. These compounds have gained significant attention and interest due to their multiple applications, ranging from food processing and preservation, to the pharmaceutical industry3,4,5. The past investigations revealed that, numerous phenols have been used for preparing traditional medicines6. Many deaths worldwide have been attributed to factors like oxidative stress, hypertension, weak immune system, microbial infections, and the development of resistance to antibiotics7. It enables them to assist in treating various illnesses and other medical conditions8. Dietary polyphenols are a diverse class of naturally occurring compounds with two phenyl rings and one or more hydroxyl (O H) groups which belongs to the kingdom Plantae9. Around 4000–8000 currently known polyphenolic substances exclusively includes flavonoids10. A heterogeneous group of phenolic chemicals are called polyphenols11. Flavonoids and phenolic acids are the two main groups of polyphenols. Hydroxycinnamic and Hydroxybenzonic acids are the two subcategories of Phenolic acids12. They are either non-conjugated (as an aglycone) or conjugated with substances, such as glucose, amines, lipids, organic acids, and carboxylic acids1. The structures of some notable polyphenols are shown in the Fig. 1. Polyphenols are also known as secondary metabolites, which are mostly found in the kingdom of plants. Due to the anti-bacterial, anti-oxidant, anti-cancer, anti-hypertensive, immunomodulatory, and anti-inflammatory properties, polyphenols have considerable health-promoting benefits. Therefore, it is the prime objective of this paper to model the molecular topology of these important polyphenols and perform a QSPR analysis to predict the physicochemical properties.

Chemical graph theory is the branch of graph theory that applies to the mathematical modelling of chemical substances. A molecular graph/ chemical graph is a graph representation of structural interrelation of atoms and chemical bonds among them in a molecule. Chemical graph theory applies mathematical methods to predictions of the properties of chemicals. This approach relates molecular chemical structure to its chemical reactivity, physical behavior, and physicochemical properties. Chemical graph theory is widely applicable to chemical reaction analysis, material design, drug design etc1,13,14,15,16,17,18,19,20,21,22,23,24,25. A molecular graph G represents the unsaturated hydrocarbon skeletons of molecules/compounds. The vertex set denoted by V (G) correspond to non-hydrogen atoms. The edge set E(G) of a molecular graph represent covalent bonds between atoms26,27,28,29,30. Omar et al. developed eight derivatives based on the main structure of hydroxychloroquine to treat COVID-19 and used QSAR investigation to calculate the biological activity of the designed compounds. These compounds were evaluated for their biological activity using a method called QSAR investigation31. Havare generated curvilinear regression models for the boiling point of prospective medicines against COVID-19 using multiple topological criteria32.

Gutman, in 197233, defined and formulated The first and second Zagreb indices as

$$\:{M}_{1}\left(\text{G}\right)=\sum_{uv\epsilon E\left(G\right)}\left({d}_{u}+{d}_{v}\right)\:\:\:$$
$$\:{M}_{2}\left(\text{G}\right)=\sum_{uv\epsilon E\left(G\right)}\left({d}_{u}.{d}_{v}\right)$$

Shirdel et al.34 formulated The Hyper Zagreb index as

$$\:{HM}_{1}\left(G\right)=\sum_{uv\epsilon E\left(G\right)}{\left({d}_{u}+{d}_{v}\right)}^{2}$$

The second and third Zagreb index was redefined by Ranjini et al.35 as

$$\:R{eZG}_{2}\left(G\right)=\sum_{uv\epsilon E\left(G\right)}\frac{{d}_{u}\times\:{d}_{v}}{({d}_{u}+{d}_{v})}\:\:$$
$$\:R{eZG}_{3}\left(G\right)=\sum_{uv\epsilon E\left(G\right)}\left({d}_{u}\times\:{d}_{v}\right)\:\:\left({d}_{u}+{d}_{v}\right)\:$$

Vukičević et al.36 suggested the Symmetric division degree index as

$$\:\text{S}.\text{S}.\text{D}.\left(\text{G}\right)=\sum_{uv\epsilon E\left(G\right)}\:\left(\frac{{{d}^{2}}_{u}+{{d}^{2}}_{v}}{{d}_{u}\times\:{d}_{v}}\right)$$

Similarly, many other indices can be used in QSPR/QSAR analysis14,26,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52. Recently, many scientists have shown an increased interest in mathematical chemistry. Since 1988, numerous academic articles on mathematical chemistry are being released annually. Chemical graph theory connects graph theory with chemistry, and produces useful results that chemists can use. The chemical applications of graph theory have been thoroughly discussed in a wide range of works53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68.

Materials and methods

Simple polyphenol graphs are considered for molecular topological modeling. Edge partitioning, Vertex partitioning, and computational techniques of graph theory are applied to compute the topological indices of the six structures under consideration. Regression models are then formulated to compare the computed topological indices with the properties of the considered molecules. The regression analysis was performed using MS Excel software.

Results and discussion

Regression model

Four physical properties (Complexity, Boiling Point (BP), Molecular Weight (MW), and Polar Surface Area (PSA)) are studied for each of the six Polyphenols. Regression analysis is performed for the six polyphenols based on the below model

$${\text{P}}\,=\,{\text{A}}\,+\,{\text{B }}\left[ {{\text{T}}.{\text{I}}.} \right]$$
(1)

where , → constants, → Physical property of the drug, → topological descriptor.

The regression model for the topological indices in question is defined using this linear regression equation. Six polyphenols’ molecular networks’ topological indices are regarded as independent variables. On the other hand, the physical attributes are considered as dependent variables. Models for linear regression are created in MS Excel package. The constants A and B in the regression Eq. (1) can be found by the data in Tables 1 and 2.

Fig. 1
figure 1

Molecular structures of considered Polyphenols.

For first Zagreb index M1 (G)

$$\begin{gathered} {\text{Boiling point}}\,=\,{\text{99}}.{\text{84728}}\,+\,{\text{4}}.{\text{494}}0{\text{93 }}\left[ {{{\text{M}}_{\text{1}}}\left( {\text{G}} \right)} \right] \hfill \\ {\text{Molecular Weight}}\,=\,0.{\text{3}}0{\text{68}}0{\text{9}}\,+\,{\text{3}}.0{\text{14766 }}\left[ {{{\text{M}}_{\text{1}}}\left( {\text{G}} \right)} \right] \hfill \\ {\text{Complexity}}\,=\, - \,{\text{67}}.{\text{2393}}\,+\,{\text{4}}.{\text{232474M1 }}\left( {\text{G}} \right)] \hfill \\ {\text{Polar surf Area}}\,=\,{\text{3}}.{\text{143836}}\,+\,{\text{1}}.0{\text{5}}0{\text{685 }}\left[ {{{\text{M}}_{\text{1}}}\left( {\text{G}} \right)} \right] \hfill \\ \end{gathered}$$

For second Zagreb index M2 (G)

$$\begin{gathered} {\text{Boiling point}}\,=\,{\text{111}}.{\text{4944}}\,+\,{\text{3}}.{\text{857334 }}\left[ {{{\text{M}}_{\text{2}}}\left( {\text{G}} \right)} \right] \hfill \\ {\text{Molecular Weight}}\,=\,{\text{12}}.{\text{89697}}\,+\,{\text{2}}.{\text{513164 }}\left[ {{{\text{M}}_{\text{2}}}\left( {\text{G}} \right)} \right] \hfill \\ {\text{Complexity}}\,=\, - \,{\text{45}}.{\text{6952}}\,+\,{\text{3}}.{\text{467977}}\left[ {{{\text{M}}_{\text{2}}}\left( {\text{G}} \right)} \right] \hfill \\ {\text{Polar surf Area}}\,=\,{\text{6}}.{\text{581182}}\,+\,0.{\text{89}}0{\text{683 }}{{\text{M}}_{\text{2}}}\left( {\text{G}} \right)] \hfill \\ \end{gathered}$$

For hyper Zagreb index HM (G)

$$\begin{gathered} {\text{Boiling point}}\,=\,{\text{1}}0{\text{9}}.{\text{4777}}\,+\,0.{\text{9}}0{\text{9583 }}\left[ {{\text{HM }}\left( {\text{G}} \right)} \right] \hfill \\ {\text{Molecular Weight}}\,=\,{\text{12}}.{\text{43414}}\,+\,0.{\text{589517 }}\left[ {{\text{HM }}\left( {\text{G}} \right)} \right] \hfill \\ {\text{Complexity}}\,=\, - \,{\text{46}}.{\text{6661}}\,+\,0.{\text{8147 }}\left[ {{\text{HM }}\left( {\text{G}} \right)} \right] \hfill \\ {\text{Polar surf Area}}\,=\,{\text{6}}.00{\text{2}}00{\text{1}}\,+\,0.{\text{21}}0{\text{442 }}\left[ {{\text{HM }}\left( {\text{G}} \right)} \right] \hfill \\ \end{gathered}$$

For Redefined second Zagreb index ReZG2 (G)

$$\begin{gathered} {\text{Boiling point}}\,=\,{\text{1}}0{\text{1}}.{\text{9823}}\,+\,{\text{19}}.{\text{24}}0{\text{75 }}\left[ {{\text{ReZ}}{{\text{G}}_{\text{2}}}\left( {\text{G}} \right)} \right] \hfill \\ {\text{Molecular Weight}}\,=\, - \,0.0{\text{8718}}\,+\,{\text{13}}.0{\text{4396}}\left[ {{\text{ReZ}}{{\text{G}}_{\text{2}}}\left( {\text{G}} \right)} \right] \hfill \\ {\text{Complexity}}\,=\, - \,{\text{67}}.{\text{853}}\,+\,{\text{18}}.{\text{31713 }}\left[ {{\text{ReZ}}{{\text{G}}_{\text{2}}}\left( {\text{G}} \right)} \right] \hfill \\ {\text{Polar surf Area}}\,=\,{\text{3}}.{\text{697662}}\,+\,{\text{4}}.{\text{494249 }}\left[ {{\text{ReZ}}{{\text{G}}_{\text{2}}}\left( {\text{G}} \right)} \right] \hfill \\ \end{gathered}$$

For Redefined third Zagreb index RezG3 (G)

$$\begin{gathered} {\text{Boiling point}}\,=\,{\text{125}}.0{\text{4}}0{\text{5}}\,+\,0.{\text{73884}}\left[ {{\text{ReZ}}{{\text{G}}_{\text{3}}}\left( {\text{G}} \right)} \right] \hfill \\ {\text{Molecular Weight}}\,=\,{\text{27}}.{\text{13}}0{\text{2}}\,+\,0.{\text{464299}}\left[ {{\text{ReZ}}{{\text{G}}_{\text{3}}}\left( {\text{G}} \right)} \right] \hfill \\ {\text{Complexity}}\,=\, - \,{\text{22}}.{\text{2535}}\,+\,0.{\text{628695 }}\left[ {{\text{ReZ}}{{\text{G}}_{\text{3}}}\left( {\text{G}} \right)} \right] \hfill \\ {\text{Polar surf Area}}\,=\,{\text{1}}0.{\text{354}}0{\text{3}}\,+\,0.{\text{168566 }}\left[ {{\text{ReZ}}{{\text{G}}_{\text{3}}}\left( {\text{G}} \right)} \right] \hfill \\ \end{gathered}$$

For symmetric division degree index SSD(G)

$$\begin{gathered} {\text{Boiling point}}\,=\,{\text{89}}.{\text{26778}}\,+\,{\text{8}}.{\text{941635 }}\left[ {{\text{S}}.{\text{S}}.{\text{D}}.{\text{ }}\left( {\text{G}} \right)} \right] \hfill \\ {\text{Molecular Weight}}\,=\, - \,{\text{6}}.{\text{76119}}\,+\,{\text{5}}.{\text{997342 }}\left[ {{\text{SSD }}\left( {\text{G}} \right)} \right] \hfill \\ {\text{Complexity }}={\text{ }} - {\text{78}}.{\text{9525}}\,+\,{\text{8}}.{\text{4791}}0{\text{1 }}\left[ {{\text{SSD }}\left( {\text{G}} \right)} \right] \hfill \\ {\text{Polar surf Area}}\,=\,0.{\text{255878}}\,+\,{\text{2}}.{\text{1}}0{\text{4229 }}\left[ {{\text{SSD }}\left( {\text{G}} \right)} \right] \hfill \\ \end{gathered}$$

The correlation coefficients

Table 1 lists the four physical parameters of the Polyphenols used in this study, these properties have been taken from Pubchem database. Table 2 shows the six topological indices values, which have been obtained via edge partitioning, vertex partitioning, and computational techniques of graph theory. Table 3 shows the correlation coefficients of six physical attributes and topological indices. From Table 3, it can be observed that, the first Zagreb index shows a strong correlation value (r = 0.992706) for molecular weight. Figure 2 is a graphic depiction of the correlation coefficients of TIs and physical properties. Tables 4, 5, 6, 7, 8 and 9 depict the statistical parameters. The parameter N shows sample size, b is slope, A is a constant and r shows the correlation coefficient. The null hypothesis is tested when each term’s coefficient is equal to zero; the greater the p-value, the more probable it is that changes in the predictor have nothing to do with changes in the responder. In this case, the null hypothesis’s regression coefficients are all zero, yet the test yields a F value. This kind of scenario cannot be predicted by the model. This test can be used to assess whether the coefficients in a model are superior to those without predictor variables. Table 10 gives the standard error of estimation for physical properties of polyphenols under study. Tables 11, 12, 13 and 14 is a comparison of computed and actual values of all physical attributes of polyphenols.

Table 1 Experimental properties.
Table 2 Topological indices values.
Table 3 Correlation Coefficients.
Fig. 2
figure 2

Physical properties on T1.

Table 3; Fig. 2 demonstrate that all the topological indices show a good correlation with the appropriate physical characteristic. By examining correlation coefficients, we see that M1(G) index gives the highest correlation value (r = 0.992706) for molecular weight. The second Zagreb index has a high correlation (r = 0.999204) with Complexity, the atom bond connectivity index has a high correlation (r = 0.99985) with polar surface area, the geometric arithmetic index has the highest correlation coefficient (r = 0.999883) for molecular weight, the symmetric division degree gives the best correlated value (r = 0.999484) for polar surface area, and the harmonic index shows good correlation (r = 0.999483) with molecular weight. These results indicate that, the considered topological indices have the potential to predict the properties efficiently and can replace the laborious laboratory experimentations as alternative theoretical tools.

Table 4 The statistical parameters for M1.
Table 5 The statistical parameters for M2.
Table 6 The statistical parameters for HM.
Table 7 The statistical parameters for ReZG2.
Table 8 The statistical parameters for ReZG3.
Table 9 The statistical parameters for SSD.
Table 10 Standard error of estimate.
Table 11 Comparison of the computed values generated by regression model of T1 with actual values of boiling points.
Table 12 Comparison of the computed values generated by regression model of T1 with actual values of molecular weight.
Table 13 Comparison of the computed values generated by regression model of T1 with actual values of complexity.
Table 14 Comparison of the computed values generated by regression model of T1 with actual values of Polar surface area.

Conclusions

Degree-based topological indices can be used to quantify and analyze the structural features of polyphenolic compounds. By incorporating these indices into QSPR models, we can establish relationships between the structural characteristics of polyphenols and their physical properties. The results demonstrate that all the topological indices show a good correlation with the appropriate physical characteristic. By examining correlation coefficients, we see that M1(G) index gives the highest correlation value (r = 0.992706) for molecular weight. The second Zagreb index has a high correlation (r = 0.999204) with Complexity, the atom bond connectivity index has a high correlation (r = 0.99985) with polar surface area, the geometric arithmetic index has the highest correlation coefficient (r = 0.999883) for molecular weight, the symmetric division degree gives the best correlation value (r = 0.999484) for polar surface area, and the harmonic index shows good correlation (r = 0.999483) with molecular weight. Degree-based topological indices also provide insights into the number of bonds, connectivity patterns, and branching characteristics in polyphenolic compounds. QSPR analysis utilizes these indices and corresponding experimental data on the physical characteristics of polyphenols to develop predictive models.

In future work, the integration of machine learning techniques, such as random forests, support vector machines, and gradient boosting regressors, can enhance the predictive power of QSPR models by capturing complex, non-linear relationships between topological indices and physicochemical properties. Ensemble learning methods, in particular, offer robustness against overfitting and can aggregate predictions from multiple base models to improve generalization. These data-driven approaches can complement traditional linear regression by identifying subtle structural patterns and interactions that may be overlooked in linear models. As more polyphenolic data becomes available, combining degree-based descriptors with advanced regression frameworks could lead to more accurate, scalable, and interpretable models for screening and evaluating polyphenolic compounds in drug discovery, nutraceuticals, and materials chemistry.