Data-driven regression analysis of amylose using Sombor molecular descriptors

Mufti, Zeeshan Saleem; Asim, Muhammad; Shflot, A. S.; Saeed, Syed Tauseef; Younis, Jihad

doi:10.1038/s41598-025-27897-8

Download PDF

Article
Open access
Published: 22 December 2025

Data-driven regression analysis of amylose using Sombor molecular descriptors

Zeeshan Saleem Mufti¹,
Muhammad Asim¹,
A. S. Shflot²,
Syed Tauseef Saeed¹ &
…
Jihad Younis³

Scientific Reports volume 15, Article number: 44294 (2025) Cite this article

819 Accesses
1 Citations
Metrics details

Subjects

Abstract

Amylose, a vital polysaccharide component of starch, plays a significant role in plant energy storage and has important implications in nutrition and health. In this study, the structural characteristics of amylose are analyzed using Sombor indices, a relatively recent method in topological molecular analysis. Leveraging Euclidean geometry, this work introduces the first area-based Sombor index, offering a novel perspective on the molecular connectivity and spatial configuration of amylose. The third and fifth Sombor indices are derived from perimeter-based geometric principles, introducing a new level of complexity to the topological characterization. In contrast, the second, fourth, and sixth indices are developed using angular-based formulations, enabling a more refined structural interpretation. To assess the relationship between these indices and the physicochemical properties of amylose, regression analysis was performed using supervised machine learning techniques. This statistical modeling uncovered meaningful correlations, enhancing our understanding of how molecular topology relates to chemical behavior. Additionally, Analysis of Variance (ANOVA) was applied to determine the statistical significance of each index. Correlation analyses revealed strong interrelationships among the indices. The results indicate that among all considered Sombor-based indices, SO$_4$ and SO$_5$ are the most effective predictors of amylose’s structural and functional properties. In particular, SO$_5$ exhibited the highest predictive accuracy and model robustness, while SO$_4$ also demonstrated consistent performance, affirming their applicability in molecular modeling. This research underscores the potential of Sombor indices as reliable topological descriptors for molecular classification and offers valuable insights into the physiochemical behavior of amylose. The findings open new directions for applying topological analysis to the study of biopolymers and polysaccharides, with implications in materials science, biochemistry, and food technology.

Understanding the structure and composition of recalcitrant oligosaccharides in hydrolysate using high-throughput biotin-based glycome profiling and mass spectrometry

Article Open access 15 February 2022

Investigation of the physicochemical factors affecting the in vitro digestion and glycemic indices of indigenous indica rice cultivars

Article Open access 17 January 2025

Molecular structure and characteristics of phytoglycogen, glycogen and amylopectin subjected to mild acid hydrolysis

Article Open access 08 June 2023

Introduction

Chemical graph theory represents a study that combines mathematical graph analysis with chemical challenges. Utilising this strategy, the chemical and pharmaceutical sciences benefit significantly from topological indices, which are quantitative characteristics created from graph invariants.¹. These parameters are frequently used to predict the physicochemical properties of organic molecules. There are many several topological indices generated for different molecular structures in the literature in this area². The degree of vertices in a molecular structure is the main source of topological indices, which are quantitative numbers referred to as graph-based molecular descriptors. These indices have demonstrated value across an extensive spectrum of industries and capture important structural data. numerous applications, which include molecular identity analyses, quantitative structure-property relationships (QSPR), and quantitative structure-activity relationships (QSAR)³. They have attracted the focus of mathematicians and chemists both due to their mathematical implications and chemical sensitivity. Since H. Wiener’s landmark 1947 work, in which he introduced the Wiener index as the first distance-based topological descriptor, the field of mathematics has developed rapidly.⁴. Approximately 300 distinct topological indices have been developed and cataloged in various databases, has demonstrating their widespread utility and continuous evolution in theoretical and computational chemistry⁵ . Topological indices are quantitative network characteristics that capture the characteristics of molecular graphs, providing a mathematical representation of molecular topology⁶. These descriptor are commonly used to estimate important physicochemical properties like boiling, melting point, and freezing point. In contemporary chemical research, conducting biological assays direct compound evaluation has become progressively impractical due to high economic expenses and the necessity for advanced workshop infrastructure. This method also heavily depends on focused arrangement, production it improper for large-scale complex showing⁷. As a result, pharmaceutical companies are continuously discovering cutting- edge way to lower the expenses related to study and development. applying topological indices to analyse structures of molecular is one attractive technique that enables it feasible to predict chemical properties without the need for expensive tools or physical labs. This technique offers a less expensive and effective substitute for current experimental techniques. The Sombor index was initially introduced in⁸. It differentiates from many standard degree-based indices since it is based on geometric interpretation. Numerous investigations of its mathematical features and its applications in chemical graph theory have been conducted prompted by the wide academic interest in this unique geometric perspective⁹. Its formulation offers a novel way of describing molecular graphs; it is based on vertex degrees with spatial or structural aspects. The Sombor index is distinctive and significant in this area since no other topological index based exclusively on vertex degrees has been distributed to date with an identical focus on geometric reasoning¹⁰. Quantitative descriptors that have been systematically derived from a chemical compound’s structural network and describe its structural properties and atomic relationship are referred to as topological indices. In multiple fields of chemistry, such as cheminformatics, drug discovery, and molecular modelling, these indices are becoming indispensable resources. In this situation, they provide important information on the molecular topological structure and interaction patterns¹¹. An important family of degree-based topological descriptors that provide useful information about the structure and physicochemical properties of molecular systems are Sombor indices, which were initially proposed by Milan Randić¹².

An important family of degree-based topological descriptors that provide useful information about the structure and physicochemical properties of molecular systems are Sombor indices, which were initially proposed by Milan Randic. These indices, which were defined within the context of graph theory, quantitatively encode the arrangement and interaction of atoms within a molecule, which enabling in-depth structural study¹³. Sombor indices have emerged as a prominent class of degree-based topological descriptors due to their mathematical robustness and structural sensitivity¹⁴. Their ability to capture both local and global structural information makes them superior to several classical indices in encoding molecular topology¹⁵. Due to these strengths, Sombor indices have been successfully applied in the prediction of various molecular properties, including boiling point, reflectivity, and toxicity^16,17. Such programs are able to quantify the de facto degree effect of molecular structure on the physical and chemical properties that they embody.

Prediction of physical basic properties is certainly a usage for Sombor indices beyond those circumstances, however, Sombor index has also been identified useful in the modelling of complex molecule systems and intermolecular interactions. Their chemical applicability across a wide range of compounds reinforces their value in both theoretical and applied chemistry¹⁸. Their strong correlation with experimentally observed properties makes them particularly effective in quantitative structure–activity relationship (QSAR) and quantitative structure–property relationship (QSPR) studies. These indices have been utilized in the analysis of polymers, nanostructures, and dendrimer-based systems, improving prediction accuracy in drug discovery and material design processes¹⁹. In addition, the expected values of Sombor indices have been explored for various classes of chemical graphs, further enriching their theoretical foundation and computational potential²⁰. In the present study, we specifically focus on amylose, a linear polysaccharide composed of $\alpha (1\rightarrow 4)$ linked D-glucose units. Its unique helical structure and physicochemical behavior make it an ideal candidate for topological and structural analysis. Amylose plays a critical role in starch functionality, influencing properties such as gelatinization, retrogradation, and digestibility. By modeling amylose as a molecular graph, topological indices—particularly degree-based indices like the Sombor index—can be applied to understand its structural behavior and predict associated physicochemical characteristics. The mathematical treatment of amylose through topological descriptors provides valuable insights into its potential applications in food science, nutrition, and materials chemistry.

The design of these six Sombor invariants is motivated by geometric analogies such as distances, angles, perimeters, and radii within molecular graphs. These formulations aim to offer refined descriptors that incorporate not only connectivity but also spatial distribution and degree asymmetry. Such features are crucial for modeling real-world biomolecules like amylose, where shape, folding, and interaction patterns influence function. Therefore, each index not only serves a mathematical role but also bears implications for the chemical behavior and structural predictability of biological polymers.

The proposed Sombor-based indices also exhibit direct mathematical connections with well-established indices such as the Forgotten and Zagreb indices. In particular, the kernel of the Sombor index satisfies the identity

$$\sum _{uv\in E(G)}(d_u^2+d_v^2)=\sum _{v\in V(G)} d(v)^3 = F(G),$$

where $F(G)$ is the Forgotten index. This relation enables us to bound the Sombor index in terms of classical descriptors, namely

$$\frac{1}{\sqrt{2}}M_1(G)\le SO(G)\le \sqrt{mF(G)},$$

with $M_1(G)$ denoting the first Zagreb index and $m=|E(G)|$. These bounds follow from the RMS–AM and Cauchy–Schwarz inequalities and hold with equality for regular graphs. Furthermore, if the geometric variant of the Sombor index is defined through the product of degrees, i.e.,

$$SO_{\textrm{geo}}(G)=\sum _{uv\in E(G)} d_ud_v,$$

then it coincides exactly with the second Zagreb index $M_2(G)$. These observations demonstrate that the newly proposed indices are not only novel but also theoretically consistent with existing topological descriptors, thereby strengthening their relevance and applicability.

Exploration and analysis of existing literature.

Amylose (see Fig. 1), a linear polysaccharide composed of $\alpha (1 \rightarrow 4)$ linked D-glucose units, constitutes approximately $20 - 30$ of starch content in most plant sources. Its unieq helical structure allows it to form complexes with various molecules, influencing its physicochemical properties and functional applications. The structural characteristics of amylose significantly impact the texture, digestibility, and stability of starch- based foods, making it a focal point in food science and nutrition research.

The physicochemical properties of amylose, such as gelatinization temperature, sweeling power, and solubility, are influenced by factor like amylose content and environmental conditions during processing. Studies have shown that higher amylose content correlates with increasing gelatinization temperatures and reduced swelling power, affecting the texture and digestibility of starch-rich foods. For instance, research on rice cultivars with varying amylose content demonstrated significant difference in their thermal and pasting properties, highlighting the role of amylose indetermining starch functionality.

Amylose’s ability to form inclusion complexes with lipids and others hydrophobic molecules has been extensively studied. These complexes possess the ability to modify starch’s digestibility and beneficial properties. For instance, the production of complexes among amylose and lipids could reduce the glycaemic response of starchy meals, thus enhancing health. Additionally, the complexion behavior of amylose is influenced by factors such as chain length, molecular weight, and processing conditions, which can be tailored to achieve desired functional attributes in food products. Amylose’s retrogradation conduct, in which gelatinised starch molecules re-associate while cooled, is crucial in determining physical diversity and shelf life of foods prepared utilising starch. Although amylose retrogrades more rapidly than amylopectin, products such bread and gels might have more difficult qualities and a potential syneresis. Enhancing food processing and storage conditions involves an in-depth comprehension of the dynamics and procedures of amylose retrogradation. Amylose information, temperature, and the existence of other chemicals may all have significant effects on the degree and speed of retrogradation, based on studies.

Recent developments in QSPR (Quantitative Structure–Property Relationship) modeling have expanded the applicability of topological descriptors to biologically and industrially relevant compounds. Studies such as those by²¹ have employed molecular graph descriptors to model anti-Alzheimer agents, revealing methodological parallels with the modeling of biopolymers like amylose. Similarly, the design of anti-biofilm agents²² and anti-HIV compounds²³ using QSPR and computational biomedicine approaches highlights the increasing utility of structural descriptors in understanding molecular functionality. Furthermore, toxicological prediction studies²⁴ reinforce the role of QSPR in evaluating chemical safety, thereby establishing a comprehensive framework where the structural modeling of amylose through topological indices finds broader relevance across pharmacological domains. These advancements justify the relevance of our Sombor-based modeling strategy and emphasize its potential integration in future biochemical and medicinal research.

In this table, $E^o_i$ represents the i-th class of edges categorized by the degrees of their endpoint vertices. The corresponding frequency denotes the total number of edges belonging to each class, i.e., $|E^o_i|$ gives the cardinality of edge set $E^o_i$.

Table 1 Classification of edges in amylose according to endpoint vertex degrees.

Subjects

Abstract

Similar content being viewed by others

Understanding the structure and composition of recalcitrant oligosaccharides in hydrolysate using high-throughput biotin-based glycome profiling and mass spectrometry

Investigation of the physicochemical factors affecting the in vitro digestion and glycemic indices of indigenous indica rice cultivars

Molecular structure and characteristics of phytoglycogen, glycogen and amylopectin subjected to mild acid hydrolysis

Introduction

Exploration and analysis of existing literature.

Analytical framework and invariant definitions

Methods

Theorem 1

Proof

Theorem 2

Proof

Theorem 3

Proof

Theorem 4

Proof

Theorem 5

Proof

Theorem 6

Proof

Theorem 7

Proof

Regression analysis using supervised machine learning

Supervised learning approach for SO prediction

Feature importance in SO prediction

Supervised learning approach for SO\(_1\) prediction

Feature importance in SO\(_1\) prediction

Linear regression analysis for the molecular graph of amylose SO\(_1\)

Supervised learning approach for SO\(_2\) prediction

Feature importance in SO\(_2\) prediction

Linear regression analysis for the molecular graph of amylose SO\(_2\)

Supervised learning approach for SO\(_3\) prediction

Feature importance in SO\(_3\) prediction

Linear regression analysis for the molecular graph of amylose SO\(_3\)

Supervised learning approach for SO\(_4\) prediction

Feature importance in SO\(_4\) prediction

Linear regression analysis for the molecular graph of amylose SO\(_4\)

Supervised learning approach for SO\(_5\) prediction

Feature importance in SO\(_5\) prediction

Linear regression analysis for the molecular graph of amylose SO\(_5\)

Supervised learning approach for SO\(_6\) prediction

Feature importance in SO\(_6\) prediction

Linear regression analysis for the molecular graph of amylose SO\(_6\)

Correlation analysis of topological indices

Advanced correlation analysis for topological indices

Conclusion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

This article is cited by

QSPR analysis of anticancer drugs using the Euler–Sombor index and theoretical insights on its minimum value for unicyclic graphs

Search

Quick links