Data-driven analysis of chemical graph of carbazole and diketopyrrolopyrrole

Mufti, Zeeshan Saleem; Khan, Azhar Ahmed; Asim, Muhammad; Shflot, A. S.; Saeed, Syed Tauseef; Morga, Feyisa Edosa

doi:10.1038/s41598-025-04878-5

Download PDF

Article
Open access
Published: 29 September 2025

Data-driven analysis of chemical graph of carbazole and diketopyrrolopyrrole

Zeeshan Saleem Mufti¹,
Azhar Ahmed Khan¹,
Muhammad Asim¹,
A. S. Shflot²,
Syed Tauseef Saeed¹ &
…
Feyisa Edosa Morga³

Scientific Reports volume 15, Article number: 33631 (2025) Cite this article

1079 Accesses
Metrics details

Subjects

Abstract

Topological indices play a key role in molecular graph theory, consisting of mathematical tools that allocate numerical values to molecular structures. These indices are used to anticipate a variety of physicochemical, biological, and pharmacological properties of chemical compounds. This study performs a detailed statistical analysis of different topological indices, such as the First Zagreb index, scrutinizing its associations with other indices through regression modeling and correlation analysis. Research generates predictive models, including linear, quadratic, and cubic regression equations by using machine learning techniques. The results show that linear regression delivers the most accurate predictions, whereas the quadratic regression model improves the understanding of actual versus predicted values, improving the valuation of molecular properties. A using statistical evaluation of the selected topological indices involved computing essential metrics such as mean, median, variance, standard deviation, range, interquartile range (IQR), skewness, and kurtosis. These metrics expand our understanding of the allocation and adaptability of indices, confirming their robustness in molecular description and predictive modeling. Using a machine learning-based statistical method, the study increases the use of topological indices in cheminformatics, drug discovery, and materials science. These findings assistance the development of QSAR and QSPR models, supporting the critical role of statistical verification in molecular descriptor. This method promotes more accurate, data-driven strategies in computational chemistry and bioinformatics.

Predicting bone cancer drugs properties through topological indices and machine learning

Article Open access 24 August 2025

Leveraging topological indices and machine learning for advanced prediction of antidepressant drug properties

Article Open access 05 January 2026

Role of topological indices in predictive modeling and ranking of drugs treating eye disorders

Article Open access 08 January 2025

Introduction

Chemical graph theory is an interdisciplinary field that bridges chemistry with mathematical graph modeling. In this domain, topological indices serve as graph invariants, playing a crucial role in chemical and pharmaceutical sciences. These indices are particularly useful for predicting the physicochemical properties of organic compounds. Over the years, extensive research in chemical graph theory has introduced numerous topological indices.

Also referred to as graph parameters, topological indices are derived from vertex degrees and have diverse applications, making them valuable to both mathematicians and chemists. . Since H. Wiener introduced the Wiener index in 1947¹, nearly three thousand topological indices have been catalogued in chemical databases.

A topological index function is a graph invariant, representing the molecular structure’s topology and translating the molecular graph into a numerical representation. This value aids in predicting various physicochemical properties, like melting point, boiling point, and freezing point. In the modern pharmaceutical industry, conducting biological tests on chemical compounds necessitates a large financial investment, advanced laboratory facilities and high-tech equipment. This technique is both expensive and time intensive².

To overcome these challenges, pharmaceutical firms are actively seeking to explore cost-effective alternatives. A promising alternative involves analyzing chemical structure using topological indices, which can eliminate the need for costly equipment and extensive lab testing. This technique explores a more economical and time-efficient budget for studying chemical properties.

Topological indices are numerical tools in mathematical chemistry and cheminformatics. They aid in quantifying the molecular structure by translating its topology into numerical values. Topological indices are derived from molecular graph structure and serve as powerful tools in predicting biological, physicochemical, and pharmacological properties of compounds. The fundamental concept of topological indices is to represent molecular structure as numerical values while retaining their connectivity and structural essence. By leveraging topological indices, researchers can construct QSAR (Quantitative Structure Activity Relationship) models, which play a crucial role in drug discovery, materials science, and various chemical applications³.

One of the earliest and most well-known topological indices is the Zagreb index, first introduced by Gutman and Trinajstić in 1972. It has two main variants: the first Zagreb index $(\lambda _1)$ and the second Zagreb index $(\lambda _2)$. These indices are defined based on the degrees of vertices in a molecular graph. Over time, several modifications and extensions of the Zagreb indices have emerged, such as the third Zagreb index $(\lambda _3)$, the redefined Zagreb indices and the reduced Zagreb indices. These enhancements have shown improved predictive capabilities in various biological and chemical studies⁴.

Another important class of topological indices includes degree-based indices like the Augmented Zagreb Index (AZI) and the Atom-Bond Connectivity (ABC) index. The atom bound connectivity index by Estrada et al. (1998), is extensively used for estimating the stability of chemical compound and enthalpy of formation.Like wise augmented Zagreb index of AZI, an extension of the Zagreb indices, provides better correlation with thermodynamic properties and finds applications in nanotechnology and materials science⁵.

Recently, modified versions of topological indices have gained interest for to their enhanced accuracy and computational efficiency. These indices, such as the Redefined First Zagreb index $(R\lambda _1)$, Redefined Second Zagreb index $(R\lambda _2)$, and Redefined Third Zagreb index $(R\lambda _3)$, enhance molecular characterization by providing omproved discriminative capabilities. Research has shown that these indices can surpass traditional ones in QSAR/QSPR modeling, making them valuable tools in computational chemistry and pharmaceutical research⁶.

Topological indices are extensively utilized in various scientific fields because of their broad range of applications. In drug discovery, these indices ad in predicting the biological activity of pharmaceutical compounds, facilitating the optimization of molecular properties to improve efficacy while reducing toxicity.Researchers may quickly identify possible drug candidates by including topological indices into machine learning algorithms. This effectively decreases the time as well as expenses related with experimental drug screening. These indices are also vital when exploring protein-ligand interactions, helping with the discovery of new inhibitors and drugs.

Topological indices have significance in materials science and nanotechnology in alongside drugs.The development of polymers and nanomaterials requires them because they allow for easier to predict significant material qualities including stability, reactivity, and electrical activity. These indices are employed in scientists to alter structural properties for particular uses, like advanced composites, conductive polymers, and high-performance coatings. Topological indicators also aid in the growth of eco-friendly materials by guiding the creation of biodegradable, sustainable compounds with specific uses.

Topological indices, that offer insight into the structure and function of chemical compound, are important instruments in mathematical chemistry.These indices are expected to have a substantial contribution to future finds in materials science, bioinformatics, and drug discovery with regular update improvements and enhancements. The Predictive modelling and molecular design could advance further with the integration of topological indices with machine learning and artificial intelligence⁷.

Preliminary framework and methodology

A graph with the vertex set V(G) and edge set E(G), where edges denote connections between vertices, is called graph and denote $G =(V(G), E(G))$. The number of edges |E(G) determines the size of G, whereas the number of |V(G)| determines its order. The degree of a vertex $u\in V(G)$ is the number of edges incident to it, and it is represented as deg(u) or $d_u$. A graph is irregular if all its vertices have distinct degrees, and regular if all vertices have the same degree.

The first Zagreb index $\lambda _1(G)$ and the second Zagreb index $\lambda _2(G)$ is defined as follows:

$$\begin{aligned} \lambda _1(G)= & \sum _{u,v \in E(G)} (d_u + d_v) \end{aligned}$$

(1)

$$\begin{aligned} \lambda _2(G)= & \sum _{u,v \in E(G)} (d_u \cdot d_v) \end{aligned}$$

(2)

The Zagreb indices $\lambda _1(G)$ and $\lambda _2(G)$ were first introduced by Gutman and Trinajstić in 1972. These indices appeared in certain approximate expressions for the total $\pi$- electron energy⁸. For a detailed discussion on the mathematical theory and chemical applications of the Zagreb indices , refer to^{9,10,11,12,13,14,15,16,17}

$$\begin{aligned} \lambda _3(G) = \sum _{uv \in E(G)} (d_u + d_v)^2 \end{aligned}$$

(3)

Ediz¹⁸ introduced the reduced first Zagreb index represented as follows,

$$\begin{aligned} R\lambda _1(G) = \sum _{v \in V(G)} (d_v - 1)^2 \end{aligned}$$

(4)

This index is a modified version of the first Zagreb index , designed to explore the relationship between graph structure and molecular properties, particular in the field of chemical graph theory¹⁹.

$$\begin{aligned} R\lambda _2(G) = \sum _{uv \in E(G)} (d_G(u) - 1)(d_G(v) - 1) \end{aligned}$$

(5)

Furtula, Graovac, and Vukićević (2010) presented the Augmented Zagreb Index as an enhancement of the conventional Zagreb indices, providing in QSAR/QSPR research with a higher connection with molecular attributes as stability, enthalpy of formation, and boiling temperatures²⁰.

$$\begin{aligned} AZI(G) = \sum _{uv \in E(G)} \left( \frac{d_u \cdot d_v}{d_u + d_v - 2} \right) ^3 \end{aligned}$$

(6)

The Redefined Zagreb Indices were developed as adaptations of the classic Zagreb indices to better reflect the structural features of molecular graphs. The degree-based adjustments that these indices include help in the prediction of molecule stability and physicochemical features. When it involves QSAR/QSPR investigations, the Redefined First, Second, and Third Zagreb Indices provide different approaches to molecular structure analysis²¹.

$$\begin{aligned} ReZG_1(G)= & \sum _{uv \in E(G)} \frac{d_u + d_v}{d_u d_v} \end{aligned}$$

(7)

$$\begin{aligned} ReZG_2(G)= & \sum _{uv \in E(G)} \frac{d_u \cdot d_v}{d_u + d_v} \end{aligned}$$

(8)

$$\begin{aligned} ReZG_3(G)= & \sum _{uv \in E(G)} (d_u \cdot d_v) \cdot (d_u + d_v) \end{aligned}$$

(9)

Description of graph of carbazole and diketopyrrolopyrrole $(Cz-Dpp)$

This section outline the theoretical features of Carbazole and Diketopyrrolopyrrole $(Cz-Dpp)$ . In Table 1 and Table 2, the vertices of graph G classified based on to their degrees, where n shows the parameter dominating the vertex count. The corresponding Fig. 1 below illustrates these classifications visually (Fig. 2).

Table 1 Partition the graph G according to the vertex degrees.

Subjects

Abstract

Similar content being viewed by others

Predicting bone cancer drugs properties through topological indices and machine learning

Leveraging topological indices and machine learning for advanced prediction of antidepressant drug properties

Role of topological indices in predictive modeling and ranking of drugs treating eye disorders

Introduction

Preliminary framework and methodology

Description of graph of carbazole and diketopyrrolopyrrole \((Cz-Dpp)\)

Results and discussion of carbazole and diketopyrrolopyrrole \((Cz-Dpp)\)

Theorem 1

Proof

Theorem 2

Proof

Theorem 3

Proof

Theorem 4

Proof

Theorem 5

Proof

Theorem 6

Proof

Theorem 7

Proof

Theorem 8

Proof

Theorem 9

Proof

Linear regression equation of carbazole and diketopyrrolopyrrole (Cz-Dpp)

Methodology and modeling

Critical analysis of results and graphs

Actual versus predicted value comparisons

Quadratic regression analysis of topological indices

Quadratic regression equation of carbazole and diketopyrrolopyrrole (Cz-Dpp)

Prediction accuracy and cross validation analysis of \(\lambda _2(G)\)

Prediction accuracy and cross validation analysis of \(\lambda _3(G)\)

Prediction accuracy and cross validation analysis of \(R\lambda _1(G)\)

Prediction accuracy and cross validation analysis of \(R\lambda _2(G)\)

Prediction accuracy and cross validation analysis of AZI(G)

Prediction accuracy and cross validation analysis of \(ReZG_1(G)\)

Prediction accuracy and cross validation analysis of \(ReZG_2(G)\)

Prediction accuracy and cross validation analysis of \(ReZG_3(G)\)

Cubic regression analysis of topological indices

Cubic regression equation of carbazole and diketopyrrolopyrrole (Cz-Dpp)

Prediction accuracy and cross validation analysis of \(\lambda _2(G)\)

Prediction accuracy and cross validation analysis of \(\lambda _3(G)\)

Prediction accuracy and cross validation analysis of \(R\lambda _1(G)\)

Prediction accuracy and cross validation analysis of \(R\lambda _2(G)\)

Prediction accuracy and cross validation analysis of AZI(G)

Prediction accuracy and cross validation analysis of \(ReZG_1(G)\)

Prediction accuracy and cross validation analysis of \(ReZG_2(G)\)

Prediction accuracy and cross validation analysis of \(ReZG_3(G)\)

Correlation analysis of carbazole and diketopyrrolopyrrole graph (Cz-Dpp)

Pearson and spearman correlations

Descriptive statistics analysis of carbazole and diketopyrrolopyrrole graph (Cz-Dpp)

Correlation between topological indices and opto-electrochemical property of carbazole and diketopyrrolopyrrole graph (Cz-Dpp)

Interpretation of photovoltaic performance

Correlations with optical and photovoltaic properties

Conclusions and reliability

Sample size

Validation

Reproducibility

Comparison with other descriptors

Scientific explanation

Predictive use

Clarify the purpose of the indices

Conclusion

Data availibility

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions