Applications of Sombor topological indices and entropy measures for QSPR modeling of anticancer drugs: a Python-based methodology

Kara, Yeliz; Sağlam Özkan, Yeşim; Bektaş, Ali Berkan; Arockiaraj, Micheal

doi:10.1038/s41598-025-32906-x

Download PDF

Article
Open access
Published: 23 December 2025

Applications of Sombor topological indices and entropy measures for QSPR modeling of anticancer drugs: a Python-based methodology

Yeliz Kara¹,
Yeşim Sağlam Özkan¹,
Ali Berkan Bektaş¹ &
…
Micheal Arockiaraj²

Scientific Reports volume 16, Article number: 3005 (2026) Cite this article

1526 Accesses
2 Citations
Metrics details

Subjects

Abstract

The development of effective anticancer drugs remains a central objective in pharmaceutical research. In recent years, topological indices (TIs) have gained considerable attention for their ability to numerically represent molecular structures and support predictive modeling in cheminformatics. This study aims to explore the potential of recently introduced Sombor topological indices and their entropy-based extensions within the framework of quantitative structure–property relationship (QSPR) modeling. The study will focus specifically on anticancer compounds, utilizing graph theory and edge partition approach. A comprehensive Python-based computational framework was developed to compute the relevant topological descriptors and entropy measures. The calculated indices were then integrated with statistical regression and machine learning techniques to construct and evaluate QSPR models to predict characteristics such as boiling point, molar refractivity, heavy atom count, exact mass, flash point, and polarizability. A curated dataset of anticancer agents was employed to ensure data reliability and chemical diversity. Comparative regression analyses indicate that Sombor indices exhibit stronger predictive performance and higher statistical significance than their entropy-based counterparts. These findings highlight the promise of Sombor indices as reliable molecular descriptors for QSPR modeling and powerful tools in the cheminformatics-guided drug discovery process.

QSPR analysis of anticancer drugs using the Euler–Sombor index and theoretical insights on its minimum value for unicyclic graphs

Article Open access 02 February 2026

On QSPR analysis of pulmonary cancer drugs using python-driven topological modeling

Article Open access 01 February 2025

Predicting tuberculosis drug properties using extended energy based topological indices via a python driven QSPR approach

Article Open access 05 May 2025

Introduction

The development of effective anticancer drugs remains an important goal in the field of medicinal chemistry and pharmaceutical sciences. In recent years, the integration of chem-informatics and computational methods has significantly accelerated the drug discovery process, in particular by enabling the development of predictive modeling techniques¹. In this context, quantitative structure-property relationship (QSPR) modeling has emerged as a powerful tool for establishing mathematical relationships between the chemical structures of compounds and their biological or physicochemical properties².

Topological indices (TIs), which play a central role in QSPR modeling, are numerical descriptors derived from the molecular graph of a compound. As they encapsulate fundamental structural information, these indices enable the prediction of molecular properties without the need for experimental procedures. The first theoretical QSPR approaches can be traced back to the late 1940s². These approaches correlated biological activities and physicochemical properties with theoretical numerical indices derived from molecular structure. Many TIs, ranging from classical degree-based indices such as the Randić index³ to the distance-based Wiener index⁴ and Zagreb indices⁵, have been successfully applied to the modeling of various molecular properties over the years^6,7,8,9,10. The main advantage of these indices is their ability to establish strong correlations with both structural and physico-biological properties. In recent years, considerable research has been devoted to exploring and refining TIs due to their efficacy in QSPR analysis. The authors analyzed the molecular structures of drugs related to lung cancer treatment by computing various topological indices, including degree¹¹, neighborhood¹², and reverse-degree based indices¹³. The study in¹⁴ aimed to construct a QSPR model for 14 tuberculosis drugs by employing Revan degree–based topological indices to predict key physicochemical properties. Similarly, the work in¹⁵ applied QSPR analysis to 19 prostate cancer drugs, computed various topological indices, and compared them across 13 physicochemical properties to assess their predictive performance. In^16,17, computational chemistry approaches were integrated with machine-learning techniques to investigate the relationships between diverse topological descriptors and the physicochemical characteristics of the examined compounds. Furthermore¹⁸, reported the computation of topological descriptors for several colorectal cancer drugs and evaluated their utility in predicting four physicochemical properties through QSPR modelling.

Recently, a new generation of topological descriptors has been developed beyond these classical indices. In particular, the Sombor topological index, proposed by Gutman in 2021 ¹⁹, is defined based on the vertex degrees and provides a more balanced structural representation. The Sombor index provides a richer description in terms of topological information, especially when considering the edge structure of the molecular graph. The QSPR models based on ve-degree Sombor indices for predicting key properties of aromatic heterocyclic compounds were developed in²⁰. The molecular structures of antiviral drugs were examined using graph theory and the edge-partition approach in²¹. The predictive performance of the Sombor index and its variants was evaluated using regression models developed for key PAHs in²². A theoretical investigation of Sombor indices is provided within the framework of chemical graph theory in²³.

Shannon entropy quantifies the degree of unpredictability or uncertainty within a data set, where higher entropy indicates greater complexity and randomness, while lower entropy reflects more order and predictability^24,25. A comparative study of the two versatile framework topologies, BCT and DFT, is presented and an entropy-based structural characterization is provided using bond-wise scaled comparison in²⁶. Entropy-based descriptors and degree-based topological indices were generated from molecular graph structures using edge partitioning and computed for anticancer drugs with a Python-based algorithm in²⁷. Entropy-based measures, when combined with TIs, offer a powerful framework for quantifying the structural complexity, diversity, and information content of molecular graphs²⁸. The incorporation of principles from information theory enables these descriptors to provide a probabilistic perspective on molecular symmetry and irregularity features that are frequently critical in determining chemical behavior and biological activity. This is particularly evident in the context of pharmaceutical compounds, where entropy measures complement traditional topological indices by capturing subtle variations in molecular structure that influence drug performance and efficacy²⁹.

The identification and development of anticancer pharmaceuticals remains a key challenge in pharmaceutical research due to the heterogeneity and complexity of cancer. Computational techniques such as QSPR modeling have become widespread in experimental drug screening. This is driven by the dual objectives of enhancing efficiency and reducing the time and costs associated with the process. These approaches depend on molecular descriptors, with TIs derived from graph theory being particularly significant.

The 30 pharmaceutical compounds examined in this study represent a broad therapeutic spectrum within the field of oncology. These pharmaceuticals encompass agents employed in the management of diverse solid tumors, including those affecting the breast, lung, prostate, and bladder. Moreover, they are extensively utilized in the management of various hematological malignancies (blood cancers), such as leukemia, T-cell lymphoma, and multiple myeloma. The portfolio under consideration includes a variety of pharmacological approaches to cancer treatment, including targeted therapies, alkylating agents, and chemotherapy-supportive agents. These medications constitute the foundation of personalized therapeutic strategies guided by cancer type, disease stage, and individual genetic variability. To quantitatively elucidate the relationship between the structural characteristics of these compounds and their anticancer activities, this study introduces a systematic, Python-based computational framework for QSPR modeling. Within this framework, Sombor topological indices and their entropy-based extensions are employed to capture the underlying molecular information. A dedicated Python program has been specifically developed to automate the computation of these indices and facilitate the modeling process. The statistical significance of the data is ascertained through the implementation of Python during the analytical and modeling procedures. This multifaceted approach has been demonstrated to enhance the accuracy of modeling, whilst also enabling a comparative analysis of the performance of different regression models. This work provides a novel comparative evaluation of Sombor and entropy-based topological indices, demonstrating their potential as reliable predictors in anticancer drug modeling.

Motivation and methodology

Cancer continues to represent a significant global health challenge, thus necessitating the development of efficient and cost-effective methodologies to facilitate a comprehensive understanding and optimization of the physicochemical properties of anticancer compounds. The employment of experimental techniques for the characterization of these properties is frequently accompanied by significant expenses and extended periods of time. This underscores the necessity for reliable computational alternatives. In this study, graph-theoretical modeling is employed within a QSPR framework to analyze molecular structures. This modeling is based on degree-dependent and entropy-related topological indices. This approach provides an interpretable and low-cost method for assessing drug properties and offers valuable insights that may facilitate the rational design and discovery of more effective anticancer agents. The structural framework of the methodology, along with the tools employed throughout the study, is illustrated in the flowchart provided in Figure 1.

In chemistry, TIs play a crucial role in the study of the structure and properties of chemical molecules. These indices are derived from the underlying molecular structure of the chemical molecule, which is represented as a graph. In this graphical representation, the atoms of a molecule are represented as vertices, while the chemical bonds connecting them are represented as edges. In this article, $\mathscr {G}$ is denoted as a molecular graph, with $\mathscr {G} = (V(\mathscr {G}), E(\mathscr {G}))$, where $V(\mathscr {G})$ represents the set of vertices (atoms) and $E(\mathscr {G})$ represents the set of edges (chemical bonds) in the graph. Any two vertices u and v of a graph $\mathscr {G}$ are said to be adjacent or neighboring vertices if there exists an edge $uv \in E(\mathscr {G})$ connecting them. The degree of a vertex $u \in V(\mathscr {G})$ is defined as the number of edges that are connected to vertex u, denoted by d(u).

The general mathematical form of degree based topological index ($\mathcal{T}\mathcal{I}$) with function $\psi$ is defined as

$$\begin{aligned} \mathcal{T}\mathcal{I}(\mathscr {G})=\sum _{uv\in E(\mathscr {G})}\psi (d(u),d(v)) \end{aligned}$$

(1)

where $\psi (d(u),d(v))$ is a real function of d(u) and d(v) with $\psi (d(u),d(v))\ge 0$. The entropy measure based on the topological index function $\psi$ is given by

$$\begin{aligned} \mathcal {I}_{\psi }(\mathscr {G})=-\sum _{uv\in E(\mathscr {G})}\frac{\psi (d(u),d(v))}{\sum \limits _{u_{1}v_{1}\in E(\mathscr {G})}\psi (d(u_1),d(v_1))}log\left( \frac{\psi (d(u),d(v))}{\sum \limits _{u_{1}v_{1}\in E(\mathscr {G})}\psi (d(u_1),d(v_1))}\right) . \end{aligned}$$

(2)

The formulations of $\psi (d(u),d(v))$ for the degree-based TIs and entropies of a graph $\mathscr {G}$ are given in Table 1.

Table 1 Mathematical formulations of Sombor degree-based TIs.

Subjects

Abstract

Similar content being viewed by others

QSPR analysis of anticancer drugs using the Euler–Sombor index and theoretical insights on its minimum value for unicyclic graphs

On QSPR analysis of pulmonary cancer drugs using python-driven topological modeling

Predicting tuberculosis drug properties using extended energy based topological indices via a python driven QSPR approach

Introduction

Motivation and methodology

Topological descriptors calculation

Modeling structure property through topological methods

A comparative analysis approach

Limitations and future work

Conclusion

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

This article is cited by

QSPR analysis of anticancer drugs using the Euler–Sombor index and theoretical insights on its minimum value for unicyclic graphs

Search

Quick links