Abstract
Complex biological systems often undergo a pre-transition phase prior to the onset of catastrophic event, during which a sharp and essential shift occurs. There is a pressing need to develop a swift and effective method for identifying such pre-transition phase or critical state, facilitating the timely implementation of tailored interventions to prevent irreversible and catastrophic transitions. Nonetheless, the identification of the pre-transition phase at the single-sample or single-cell level remains an exceedingly daunting task in modern clinical medicine, as reliance on small sample sizes often undermines the efficacy of traditional statistical methodologies. In this study, we propose a novel critical state algorithm based on individual sample data, termed single-sample network entropy (SNE), which effectively quantifies the disturbance caused by an individual sample relative to a set of reference samples, thereby revealing the pre-transition phases during biological development at the specific sample level. Our proposed method successfully identified pre-transition phases in both numerical simulations and eight real-world datasets, including an influenza infection dataset, three single-cell data (one associated with epithelial-mesenchymal transition (EMT) and two related to embryo development), and four tumor datasets: esophageal carcinoma (ESCA), head and neck squamous cell carcinoma (HNSC), and uterine corpus endometrial carcinoma (UCEC). In contrast to the existing single-sample approaches, our SNE method demonstrates enhanced effectiveness in detecting potential pre-transition phase. Moreover, it identifies two novel prognostic indicators: optimistic SNE (O-SNE) and pessimistic SNE (P-SNE) biomarkers for subsequent practical applications. Additionally, the reliability of computational findings is further strengthened by the functional roles of signaling biomarkers. Therefore, we present a novel computational approach that uncovers the pre-transition phases and signaling biomarkers of complex biological processes at the single sample or single-cell level, offering new insights and applications for early personalized biological analysis, including disease diagnosis and prognosis evaluation.
Similar content being viewed by others
Introduction
Significant changes frequently occur as the complex biological progresses, signifying the critical shift from a relatively normal state to a catastrophic phase1,2. The temporal evolution of complex biological processes is conceptualized as a time-dependent, nonlinear dynamic system, wherein critical transitions resemble phase shifts occurring at bifurcation points3,4. Viewed from a dynamic systems perspective, the progression of biological processes exhibits three phases (Fig. 1A)5,6: a before-transition phase with stable conditions, a pre-transition phase marked by high instability and heightened vulnerability to disturbances, and an after-transition phase representing a highly resilient but often irreversible condition. Many complex diseases, especially cancer, show few symptoms in their initial stages, usually leading to delayed diagnosis until the after-transition phase or deteriorated state becomes clinically evident7,8. Hence, recognizing early indicators of critical transitions and pinpointing the pre-transition phase are essential for ensuring prompt medical action to prevent or slow disease advancement. However, distinguishing between the pre-transition phase and the before-transition or relatively normal phase poses challenges due to subtle differences, unlike the stark contrasts typically observed between relatively normal and deteriorated states. Traditional biomarkers primarily focus on discerning stages of deterioration but lack adequate specificity to accurately detect pre-transition phase9,10,11. Moreover, they predominantly offer static measurements and fail to reflect dynamic disease changes12. Recently, dynamic network biomarker (DNB) theory has been introduced to detect critical signals of abrupt transitions or predict complex biological progression by utilizing a set of DNB molecules that display strong interconnections and significant expression fluctuations13,14,15. The DNB approach has proven effective identifying critical states in a range of biological process studies, including those on diabetes mellitus16, endocrine resistance17, and cell differentiation18,19,20. However, the application of DNB theory requires multiple samples to estimate statistical conditions. In real-world settings, collecting multiple samples from each individual is challenging, thereby limiting the practical utilization of traditional multi-sample DNB methods in both biological research and clinical practice21,22,23. In recent years, methods such as mutual information weighted entropy (MIWE)24, network information entropy of edges (NIEE)25, network information gain (NIG)26, and single-sample network module biomarkers (sNMB)27. have allowed for the evaluation of critical states in complex biological processes at the small-sample level, offering a new perspective for developing single-sample critical point methods. Nonetheless, predicting disease deterioration based solely on a single sample is a challenging task, due to the high noise levels in the high-dimensional time series data, particularly when relying on individual samples. Consequently, there is a pressing need to develop an efficient method for detecting critical point, or pre-transition phase, at the single sample level. In this study, by combining the concepts of DNB and information entropy theory, we propose a novel method called single-sample network entropy (SNE) for detecting the pre-transition phase from the perspective of individual samples. Through the computation of network entropy values for the single sample, our proposed method effectively predicts the progression of complex biological processes and uncovers the corresponding signaling biomarkers. Specifically, based on a set of relatively normal samples and the protein-protein interaction (PPI) network, SNE is derived for each individual sample (Fig. 1B). The proposed SNE can serve as a computational approach to quantify the criticality of complex biological processes at the single-sample level. A notable rise in the SNE score signals an approaching critical point or pre-transition phase (Fig. 1C). To showcase the robustness and effectiveness of our SNE method, we applied it to a numerical simulation and eight real datasets, including an influenza infection dataset, three single-cell data (one associated with EMT and two related to embryo development), and four tumor datasets (ESCA, HNSC, and UCEC) from The Cancer Genome Atlas (TCGA) database. As a result, it successfully pinpoints the pre-transition phase of complex biological processes and identify corresponding signaling biomarkers at the individual sample level. Therefore, our proposed SNE can provides essential insights for timely intervention using single sample data, preventing the irreversible and severe progression of biological processes.
A Complex diseases progression is conceptualized as occurring in three phases: the before-transition phase, characterized by stable conditions; the pre-transition phase, marked by heightened instability and increased susceptibility to disruptions; and the after-transition phase, which represents a state of high resilience but often irreversible damage. B The SNE score, calculated from an individual sample, provides a quantitative measure of the sample’s statistical deviation compared to a reference set that represents a relatively normal state. C A sharp increase in the SNE score indicates a critical shift, representing the transition to the pre-transition phase of the complex biological system.
Results
Validation of the SNE through numerical simulation
To test the performance and stability of the proposed SNE, we used an eight-node regulatory network (Supplementary Fig. 1) as a model for demonstration. This network (Fig. 2A), driven by eight stochastic differential equations (Eq. (S1) in Supplementary Note 1), models key biological processes like transcription, diffusion, and translocation by utilizing the well-established Michaelis-Menten kinetics. A numerical simulation dataset was created by adjusting the key parameter \(p\) within the range of [−0.5, 0.15], with the bifurcation point located at \(p=0\). Specifically, this specific parameter \(p=0\) marks the bifurcation point, indicating a fundamental qualitative change in the system’s dynamics. Specifically, when \(p < 0\), the real parts of all eigenvalues of the Jacobian matrix of Eq. (S1) at the equilibrium are negative, which means that originally the equilibrium point is stable. When the parameter \(p\) approaches 0, the largest real part among the all eigenvalues approaches \(0\) and then becomes positive after passing \(p=0\), implying that the original stable equilibrium becomes unstable and further bifurcates into a qualitatively different periodic solution (actually, a Hopf bifurcation occurs). Reference samples were selected from the parameter range of −0.5 to −0.45, corresponding to a stable state that is distant from the critical transition point (see Supplementary Note 1 for details).
A A network with eight nodes is used to generate the simulation dataset. B The curve of the SNE score shows a sharp increase near the critical point (\(p=0\)). C To visualize the variations in local SNE scores across each local networks, the overall landscape of these scores is provided. D As the system nears the tipping point, a significant shift in node values (Delta Standard Deviation, ΔSD) and edge weights (Delta Pearson Correlation Coefficient, ΔPCC) is observed within the network structure.
Figure 2B illustrates a dramatic increase in SNE values as the system neared \(p=0\), highlighting it as an early indicator of an imminent critical transition. Moreover, to more effectively highlight the differences between the relatively normal and critical states, Fig. 2C presents the SNE scores across 8 nodes. As illustrated, when the system is far from the critical threshold, the SNE scores of all nodes remain stable and low. In contrast, as the system approaches the critical point (\(p=0\)), some nodes exhibit a marked increase in the SNE scores, serving as an early indicator of the forthcoming critical transition. In addition, as presented in Fig. 2D, the dynamic changes in the network were used to demonstrate the difference in the standard deviation (ΔSD) and differential Pearson correlation coefficient (ΔPCC) between the before-transition and critical states. A noticeable alteration in the network structure is observed as the system nears the critical state, serving as a signal for a forthcoming critical transition from the perspective of the network level. Therefore, these results collectively validate the ability of our SNE method to pinpoint critical transitions. Source code for the simulations can be accessed at https://github.com/Chengmushe/SNE_project.
Performance of SNE on the influenza infection dataset
The SNE method was utilized on a time-series dataset derived from an experimental study on viral infections. In the original study, 17 participants were exposed to the H3N2/Wisconsin virus through nasal inoculation. Out of these, nine individuals (subjects 1, 5, 6, 7, 8, 10, 12, 13, and 15) showed signs of severe infection, while the remaining eight participants did not display any symptoms and stayed healthy28. As presented in Fig. 3A, their gene expression data were gathered from blood taken at sixteen time intervals over a span of 132 hours, with the time points being −24, 0, 5, 12, 29, 36, 45, 53, 60, 69, 77, 84, 93, 101, and 108 hours. For each individual, the expression value from the previous four time points were used as reference data, representing a non-infected, healthy state prior to any viral influence. Using the SNE algorithm outlined in Methods section, we calculated the individualized SNE scores for each participant across various time points. In Fig. 3B, the individualized SNE score for the 17 subjects reveal that the abrupt increase of SNE score successfully indicates critical signals for symptomatic participants (indicated by the red curves), while showing little fluctuation for the eight asymptomatic participants (shown by the blue curves). Furthermore, Fig. 3C illustrates the individualized SCNE scores for the nine symptomatic participants, highlighting the pre-clinical phase before the emergence of noticeable symptoms. Therefore, our proposed SNE can clearly prove its ability to reliably and precisely to reveal critical states in individual cases of influenza virus infection.
A A summary table that provides detailed information about 17 participants from the influenza infection dataset. B SNE score curves for the 17 participants, with the red curve representing the scores for the nine symptomatic individuals and the blue curve indicating the scores for the eight asymptomatic individuals. C SNE score patterns for the nine symptomatic cases, where blue circles indicate the onset of flu symptoms based on clinical observation and pinkstars signify the critical states identified by the SNE method.
Performance of SNE on different tumor diseases dataset
To assess the efficacy of the SNE method in revealing the pre-transition or pre-deterioration phase of tumor diseases, it was tested on four tumor datasets (i.e., ESCA, HNSC, and UCEC) from TCGA database. Tumor-adjacent samples, indicative of relatively healthy states, were considered as reference samples, after which the SNE score for each tumor sample was calculated following the algorithm specified in Methods section. The pre-deterioration phase of tumor diseases at each staging phase was quantitatively evaluated using the average SNE score. Through the use of the proposed method, the pre-deterioration stages were classified as stage IIA for ESCA, stage III for HNSC, and stage IIB for UCEC (Fig. 4A–C). To verify the identification of the pre-deterioration phase, Kaplan-Meier (log-rank) survival analysis was conducted to compare the prognosis outcomes of samples before and after the critical states (Fig. 4E–H). In particular, samples from the before-transition phase generally demonstrated longer expected lifespans compared to those from the after-transition phase. Moreover, when comparing samples from the two stages around the critical point, those from the critical stage exhibited significantly higher life expectancy than those from the subsequent stage (Supplementary Fig. 2A and Supplementary Fig. 2C). In contrast, no significant differences were observed in the survival curves between the two stages of the after-transition state (Supplementary Fig. 2B and Supplementary Fig. 2D).
Critical transitions were identified for four cancer types: A esophageal carcinoma (ESCA), B head and neck squamous cell carcinoma (HNSC), and C uterine corpus endometrial carcinoma (UCEC). Survival curves are compared between the before-transition and after-transition stages for each cancer type: D ESCA, E HNSC, and F UCEC. The heatmap of local SNE reveals a marked spike in SNE scores for a cluster of signaling genes at the critical point across the following five datasets: G ESCA, H HNSC, and I UCEC.
For ESCA, as shown in Fig. 4A, the SNE score sharply increased at stage IIA, marking the onset of tumor invasion into adjacent structures such as the esophageal wall and nearby lymph nodes after stage IIA29. Figure 4D reveals a significant survival curve difference between before-transition and after-transition samples (\(p=0.0001\)), with samples from stages IA-IIA showing notably longer survival than those from stage IIB-IV. In addition, the distribution of sample-level SNE values further illustrates the robustness of the SNE approach (Supplementary Fig. 3). When applied to HNSC, illustrated in Fig. 4B, the SNE score peaked at stage III, signaling an impending critical transition. Figure 4E demonstrates a significant survival difference between before-transition samples (stages I-III) and after-transition samples (stages IV) (\(p < 0.0001)\). In the case of UCEC, as depicted in Fig. 4C, the SNE score spiked at stage IIB, after which metastasis occurred30,31. Figure 4F shows a significant difference in survival curves between before-transition and after-transition samples (\(p=0\)), with stages IA-IIB exhibiting significantly longer survival than stages IIIA-IV. Moreover, to demonstrate the dynamic fluctuations of local SNE scores for signaling molecules from a global viewpoint, we plotted a heatmap of local SNE scores for the top 5% genes with the highest local SNE scores (Fig. 4G-I), revealing a sharp increase in SNE scores for a cluster of signaling gene at the critical point. In addition, compared to other single-sample methods, such as single-sample Kullback–Leibler divergence (sKLD)32 and comprehensive neighborhood-based perturbed mutual information (CPMI)33, our proposed SNE shows a stronger ability to detect pre-deterioration phases during disease development (Table 1 and Supplementary Fig. 4). Besides, the application results of single-cell data related to epithelial–mesenchymal transition (EMT) are provided in Supplementary Note 2 and Supplementary Fig. 5. The above analysis emphasizes the effectiveness of the SNE method in identifying pre-deterioration phases during tumor progression, with SNE scores serving as reliable quantitative indicator.
Functional analysis of signaling biomarkers
In addition to identifying pre-transition or pre-deterioration phases, SNE also uncovers signaling biomarkers related to prognosis and cancer-associated key biological pathways. Specifically, at the identified critical point, a subset of genes consisting of signaling genes (those in the top 5% based on their highest local SNE values) can be grouped into two categories of prognostic indicators: optimistic SNE (O-SNE) and pessimistic SNE (P-SNE). From a statistical standpoint, O-SNE biomarkers are more frequently observed in cases with favorable prognosis, whereas P-SNE biomarkers are typically found in samples associated with poorer prognosis. A detailed explanation of the selection and identification process (Supplementary Fig. 6) can be described as follows: (\(i\)) samples were dichotomized into long-survival (overall survival > median) and short-survival (overall survival ≤ median) groups based on clinically documented survival duration, using the median survival time as the threshold; (\({ii}\)) For each individual sample, molecules exhibiting the highest 5% of SNE values were extracted to constitute the SNE-signaling gene pool. Samples were dichotomized into two distinct subgroups based on the presence or absence of a specific candidate gene (gene \(i\)) within their respective SNE gene sets: the gene \(i\)-positive subgroup (comprising samples where gene \(i\) was present in their SNE-signaling genes) and the gene \(i\)-negative subgroup (containing samples where gene \(i\) was absent). A gene \(i\) was designated as a candidate biomarker if it satisfied the statistical threshold defined by the formula (\(n\)/(\(n+m\)) \(>\) \(A\) or \(m\)/(\(n+m\)) \(>\) \(A\)), where \(n\) denotes the number of identified samples in the long-survival group, \(m\) denotes the number in the short-survival group, and \(A\) is an empirical parameter set to 0.6; and (\({iii}\)) Candidate biomarkers were further evaluated using Kaplan–Meier survival analysis (log-rank test). If the survival curve of the identified subgroup was significantly longer than that of the unidentified subgroup (\(p\le 0.05\)), the gene was classified as an optimistic SNE biomarker (O-SNE). Conversely, if the identified subgroup exhibited significantly shorter survival (\(p\le 0.05\)), the gene was classified as a pessimistic SNE biomarker (P-SNE).
Figure 5A–C illustrates examples of O-SNE and P-SNE biomarkers in the context of the four tumor datasets including ESCA, HNSC, and UCEC, with further details provided in Supplementary Fig. S7 and Supplementary Table 1. These findings suggest that samples identified with O-SNE biomarkers generally show prolonged survival compared to others, while those with P-SNE biomarkers are associated with significantly reduced survival. Moreover, several O-SNE and P-SNE biomarkers have been previously implicated in tumor progression, emphasizing their crucial role in cancer development and progression (Table 2). Additionally, KEGG pathway enrichment analysis of the signaling genes indicated their involvement in tumor progression-related pathways, including the p53 signaling pathway, Cytokine-Cytokine Receptor Interaction pathway, and JAK-STAT Signaling Pathway (Fig. 5D–I). For example, the p53 signaling pathway inhibits tumor progression by regulating cell cycle, apoptosis, DNA repair, and metabolism, but its mutation can lead to loss of function or gain of oncogenic functions, thereby promoting tumor initiation and progression34. The Cytokine-Cytokine Receptor Interaction pathway supports tumor angiogenesis by promoting the proliferation and migration of vascular endothelial cells, thereby providing nutrients and oxygen to the tumor and facilitating its growth and metastasis35. The JAK-STAT signaling pathway plays a critical role in cancer progression by regulating cell proliferation, immune evasion, and inflammatory responses, and its abnormal activation is closely associated with the initiation, progression, and treatment resistance of various cancers36,37.
A O-SNE biomarkers (HPR) and P-SNE biomarkers (TP53INP1) for ESCA. B O-SNE biomarkers (SMARCAL1) and P-SNE biomarkers (DPH3) for HNSC. C O-SNE biomarkers (API5) and P-SNE biomarkers (NOTCH1) for UCEC. “Identified samples” refer to those containing biomarkers within SNE-signaling genes, while “Unidentified samples” are those without these biomarkers. KEGG enrichment analysis reveals the involvement of signaling genes in cancer-related biological processes: D and G for ESCA, E and H for HNSC, and F and I for UCEC.
Performance of SNE on embryo development datasets
Beyond its application in detecting critical states in complex diseases, our method is further employed to analyze pre-transition dynamics of two single-cell datasets related to embryo development. Specifically, the first single-cell dataset (hESC-to-DEC)38 describes the differentiation trajectory of human embryonic stem cells (hESC) from a pluripotent state through mesendoderm intermediates toward definitive endoderm cells (DEC) across the six time points: 0 h, 12 h, 24 h, 36 h, 72 h, and 96 h. The gene expression profile at 0 h is used as the reference, while the profiles at subsequent time points are treated as case samples. The trajectory toward definitive endoderm (DE) fate commitment occurs after 36 h, followed by definitive endoderm differentiation at 72 h38. It is seen from Fig. 6A that a significant shift in the SNE score is observed at 36 h, which align with the findings from the original experiments, where definitive endoderm (DE) fate commitment typically emerges after 36 h. Moreover, the robustness of SNE is supported by an analysis of the distribution of cell-level SNE values (Supplementary Fig. 3). Another single-cell dataset (MEF-to-neuron)39 is generated from a reprogramming from mouse embryonic fibroblasts (MEF) to induced neuronal cells. The differentiation of mouse embryonic intermediate cells into induced neurons occurs at Day 2239. As shown in Fig. 6B, prior to the differentiation event at Day 22, a sharp increase in the SNE score is observed between Day 5 and Day 20, providing an indication of the transition from intermediate embryonic states to induced neurons after Day 20. These results collectively demonstrate that SNE is both scalable and effective for analyzing single-cell datasets related to embryo development. Moreover, we performed Uniform Manifold Approximation and Projection (UMAP) analyses to evaluate SNE-based visualization across different time points. As shown in Fig. 6C, D, the UMAP results effectively distinguish cellular states at various time points. In addition, Fig. 6E–H highlight a significant enrichment of SNE-identified signaling genes in pathways closely associated with embryonic development processes. Besides, SNE demonstrates enhanced effectiveness in detecting critical state during embryonic development in comparison with other single-sample methods, such as sKLD32 and CPMI33 (Table 1 and Supplementary Fig. 4).
The dynamic behavior of the SNE was evaluated using two single-cell datasets: A hESC-to- DEC data and B MEF-to-neuron data. Based on the local SNE values of signaling molecules, UMAP is applied to clustering cells for (C) hESC-to-DEC data and D MEF-to-neuron data. KEGG pathway enrichment analysis of the signaling genes was conducted for (E, F) hESC-to-DEC data and G, H MEF-to-neuron data.
Discussion
Revealing early signs of imminent deterioration is essential for the management of most complex diseases. However, the limited availability of samples in experimental settings is a widespread issue, often leading to the reduced efficacy of many statistical approaches. As a result, there is a pressing need for innovative methods to overcome the challenges posed by small-sample size. In this study, we propose the single-sample network entropy (SNE) method, designed to identify critical transition points or pre-transition phases of complex biological processes at the individual sample level. Through the application of the SNE method on simulated data and eight real-world datasets, we successfully identified critical transition of impending catastrophic event. Specifically, for the viral infection experimental dataset, notable shifts in the SNE score correspond to the critical phase preceding the appearance of disease symptoms. The pre-deterioration phases for ESCA, HNSC, and UCEC were determined to be stage IIA, stage III, stage II, and stage IIB, respectively, with survival analysis indicating that early diagnosis before these stages leads to significantly better prognosis. Besides, our proposed SNE method successfully detected the critical state for three single-cell data (one associated with EMT and two related to embryo development).
Our proposed SNE offers several notable advantages. Firstly, unlike conventional approaches that determine the deterioration stage based on gene expression variations, it predicts the pre-transition phase by capturing differences in biomolecular networks between the reference samples and the mixed samples (a combination of reference and case samples), rather than depending exclusively on gene expression data. Secondly, unlike traditional DNB methods that require multiple samples at each time point to compute statistical indices, it not only identifies the pre-transition phase or critical point using just a case sample or single-cell level, but also assigns a specific score (local SNE value) to each gene. Moreover, it offers improved performance in comparison with the classical DNB method (Supplementary Fig. 8). Thirdly, it exhibits enhanced effectiveness in detecting pre-transition phases compared to other single-sample methods. For example, in the hESC-to-DEC dataset, our proposed SNE method shows a stronger ability to detect the critical state at 36 h, after which the trajectory toward DE fate commitment begins, followed by definitive endoderm differentiation at 72 h38. Similarly, in the MEF-to-neuron dataset, the critical signal identified by our SNE appears more significantly at Day 20, preceding the differentiation event at Day 2239. Fourthly, it not only detects critical signals transitions into the deterioration state but also but also uncovers novel prognostic markers: O-SNE and P-SNE biomarkers, which hold the identification of prognostic indicators and potential drug targets. Additionally, a major benefit of the SNE method is that it is model-free, which means it does not rely on model training, thus bypassing the complexities and risks of model selection and tuning errors. In conclusion, we present a novel computational approach that uncovers the pre-transition phases and signaling biomarkers of complex diseases at the single sample or single-cell level, offering new insights and applications for early personalized biological analysis, such as disease diagnosis and prognosis evaluation. However, our SNE has certain limitations, particularly its dependence on a protein-protein interaction (PPI) network as a background framework and its requirement for a reference group composed of relatively normal samples.
Methods
This study is theoretically based on dynamic network biomarker (DNB) theory2. Generally, the development of complex biological processes can be understood as a time-varying nonlinear dynamic system, with sudden changes interpreted as qualitative shifts in state at critical bifurcation points. The evolution of such biological processes can be classified into three distinct phases: (1) a stable before-transition phase characterized by heightened resilience; (2) a pre-transition phase that is particularly susceptible to disturbances, serving as the critical stage preceding the onset or worsening of the disease; and (3) another stable after-transition phase with heightened resilience. In this study, we present a single-sample network entropy (SNE) method to identify early warning indicators of critical transitions between pre-transition and after-transition phases. According to DNB theory, when a biological system approaches a critical transition point from a relatively stable state, DNB molecules appear and satisfy the following two key statistical conditions2,40:
-
A sharp increase in correlation is observed between any two members of the DNB group;
-
A dramatic increase in the standard deviation is seen among members within the DNB group.
Above properties indicate that the system’s critical state transition can be signaled by a group of closely interconnected variables exhibiting significant fluctuations at the network level. Specifically, notable fluctuations in the collective features of the network system are induced by significant changes in DNBs and their neighboring molecules. When the system approaches a critical state, the structure of the DNB subnetwork can signal an imminent critical transition through its significant changes. Therefore, in this study, building on this foundation, we propose a single-sample network entropy (SNE) method to detect qualitative state transitions and identify pre-transition phases in complex biological processes.
Pre-transition phase detection based on SNE
Using a collection of reference samples and a single case sample, our SNE method is employed to predict disease progression and detect pre-transition phase at an individual sample level. The detailed steps of this procedure are outlined in the following subsections.
[Step 1] Construction of the global/template network. To construct a global network \({N}^{G}\), genes are mapped onto a protein-protein interaction (PPI) network. Specifically, The PPI network utilized in this study was sourced from STRING (https://string-db.org), with isolated nodes being removed due to their lack of links to other nodes. The SNE index consistently identifies critical points with a similar trend across different confidence threshold (Supplementary Fig. 9). Consequently, the PPI network \({N}^{G}\) acts as a template network for all individual samples.
[Step 2] Extraction of each local network. Each local network denoted as \({{LN}}^{k}(k=\mathrm{1,2},\cdots ,Q)\), is derived from the global network \({N}^{G}\) and consists of a central gene \({g}^{k}\) along with its surrounding neighborhood genes \(\left\{{g}_{1}^{k},{g}_{2}^{k},\cdots ,{g}_{M}^{k}\right\}.\) If the global network \({N}^{G}\) contains \(Q\) genes, there will be \(Q\) corresponding local networks. For a given case sample, the edge weights between the central gene \({g}^{k}\) and its neighboring genes \(\left\{{g}_{1}^{k},{g}_{2}^{k},\cdots ,{g}_{M}^{k}\right\}\) are \(\left\{{sW}\left({g}^{k},{g}_{1}^{k}\right),{sW}\left({g}^{k},{g}_{2}^{k}\right),\cdots ,{sW}\left({g}^{k},{g}_{M}^{k}\right)\right\},\) which can be obtained using the following formula:
where the term \({x}_{{\rm{s}}}\left({g}^{k}\right)\) represents the expression value of the central gene \({g}^{k}\) in the case sample. The \(\mu ({x}_{{\rm{R}}e}\left({g}^{k}\right))\) and \(V\left({x}_{\mathrm{Re}}\left({g}^{k}\right)\right)\) indicates the mean and variance of gene expression for \({g}^{k}\) derived from the reference samples, respectively.
[Step 3] Calculation of the node probability and conditional probability. For each local network \({{LN}}^{k}(k=\mathrm{1,2},\cdots ,Q)\), the node probability \(p\left({g}_{j}^{k}\right)\) of the neighbor gene \({g}_{j}^{k}\) \(\left(j=\mathrm{1,2},\cdots ,Q\right)\) and the conditional probability \(p\left({g}^{k},|,{g}_{j}^{k}\right)\) of the central gene \({g}^{k}\) with respect to its neighborhood gene \({g}_{j}^{k}\) are determined through the following calculations:
With
and
The node probability \(p({g}_{j}^{k})\) defined in Eq. (2) is employed to quantify the extent of expression deviation for the gene \({g}_{j}^{k}\) in a given case sample relative to the reference samples. Specifically, let \({x}_{{\rm{s}}}\left({g}_{j}^{k}\right)\) denote the expression level of gene \({g}_{j}^{k}\) in the case sample S, while \(\mu ({x}_{{\rm{R}}e}\left({g}_{j}^{k}\right))\) and \(V({x}_{{\rm{R}}e}\left({g}_{j}^{k}\right))\) represent the mean and variance of its expression across all reference samples, respectively. The probability \(p({g}_{j}^{k})\) can thus be viewed as the likelihood that the expression of \({g}_{j}^{k}\) in the case sample deviates from the expected distribution inferred from the reference samples. The exponential form used in Eq. (3) reflects the probability density function of a Gaussian distribution, with the mean \(\mu ({x}_{{\rm{R}}e}\left({g}_{j}^{k}\right))\) and variance \(V({x}_{{\rm{R}}e}\left({g}_{j}^{k}\right))\) estimated from the reference expression profile of gene \({g}_{j}^{k}\). [Step 4] Calculation of the local SNE score. For the local network \({{LN}}^{k}\), the node network entropy
\(\text{s}{{NE}}^{k}\) of the neighbor gene \({g}_{j}^{k}\) and the conditional entropy \(s{{CE}}^{k}\) of the central gene \({g}^{k}\) for the single case sample are respectively computed as follows:
and
Moreover, in order to assess the criticality of the local networks at a specific time point \(t\), we introduce the local single-sample network entropy (SNE) \({E}_{t}^{k}\) for \({g}^{k}\)-local network, as defined below.
Actually, \({E}_{t}^{k}\) can be regarded as the information gain and potentially serves as a quantifiable indicator of network dynamics41.
[Step 5] Calculation of the SNE score for the single sample. Specifically, when focusing on a specific group of genes exhibiting the highest local SNE scores, the SNE score for a single sample at time point \(t\) can be computed utilizing the following formula:
where \(Q\) represents the number of genes with the top 5% highest local SNE scores. The tunable parameter \(Q\) within a range does not significantly affect the overall trend of the signal curve (Supplementary Fig. 10), suggesting the robustness of the SNE approach. To further quantify the significance of the detected critical state, we employed a one-sample t-test to assess statistical differences between before-transition phase and after-transition phases. Specifically, to assess whether the constant \(x\) differs significantly from the mean of an n-dimensional vector \(X=({x}_{1},{x}_{2},\cdots ,{x}_{n})\), the one-sample test statistic test statistic is expressed by the following equation42:
where \(\bar{X}\) and \(s\) denote the mean and standard deviation of the vector \(X\), respectively. The statistical significance between \(\bar{X}\) and \(x\) is evaluated by calculating the p-value from the t-distribution. A difference is considered statistically significant if the p-value is below 0.05; otherwise, it is not considered significant. In this study, a time point t was classified as a critical state if two criteria were met: (\(i\))\({E}_{t} > {E}_{t-1}\) and (\(ii\)) \({E}_{t}\) shows a statistically significant difference from prior information. A significant increase in SNE scores signals the approach of a critical point, offering an early indication of an impending pre-transition phase.
Data processing and functional analysis
To showcase the effectiveness of the SNE method, the proposed SNE method has been applied to three single-cell datasets, including human embryonic stem cell to definitive endoderm cells (hESC-to-DEC; GEO: GSE75748), embryonic differentiation of mouse embryonic fibroblast to neurons (MEF-to-neuron; GEO: GSE67310), and epithelial-mesenchymal transition (EMT) (GEO: GSE147405) from the GEO database (http://www.ncbi.nlm.nih.gov/geo). For these single-cell datasets, we excluded probes that did not map to corresponding NCBI Entrez gene symbols. The expression value for each gene was calculated as the average of all probes that mapped to it. In our analysis, we selected the top 2000–3000 highly variable genes, which is a common practice in scRNA-seq workflows to focus on the most informative genes. Then we treated the cells from the first time point as the reference group, representing early-stage conditions that are typically more stable and healthier before any observable transitions or perturbations occur. Additionally, we applied our method to four bulk RNA-Seq disease datasets: influenza dataset (GEO: GSE30550) from the GEO database, along with datasets for esophageal carcinoma (ESCA), head and neck squamous cell carcinoma (HNSC), and uterine corpus endometrial carcinoma (UCEC) from the Cancer Genome Atlas (TCGA) (http://cancergenome.nih.gov). The tumor datasets consisted of both tumor and adjacent tissue samples, with tumor samples classified by stage based on the available staging information, excluding those without information. Tumor-adjacent samples, representing healthier tissue states, served as the reference group. Detailed sampling conditions can be found in Supplementary Note 3. A filtering process was applied to remove probes without matching Entrez gene symbols from NCBI across all datasets.
The Kyoto Encyclopedia of Genes and Genomes (https://www.kegg.jp) was used for pathway analysis, while enrichment analysis was conducted through Metascape and the ClusterProfiler package43. Functional annotations were obtained from tools offered by the Gene Ontology Consortium (http://geneontology.org) and Ingenuity Pathway Analysis software. Cytoscape (www.cytoscape.org) was used for visualizing the networks.
Materials availability
Data for esophageal carcinoma (ESCA), head and neck squamous cell carcinoma (HNSC), and uterine corpus endometrial carcinoma (UCEC) were retrieved from The Cancer Genome Atlas (TCGA) database (http://cancergenome.nih.gov). Influenza infection data (GEO: GSE30550),epithelial-mesenchymal transition (EMT) from the A549 cancer cell line (GEO: GSE147405), human embryonic stem cell to definitive endoderm cells (hESC-to-DEC; GEO: GSE75748), and embryonic differentiation of mouse embryonic fibroblast to neurons (MEF-to-neuron; GEO: GSE67310) were obtained from the Gene Expression Omnibus (GEO) repository (http://www.ncbi.nlm.nih.gov/geo/). The source code for the algorithm can be accessed at https://github.com/Chengmushe/SNE_project.
Data availability
Data for esophageal carcinoma (ESCA), head and neck squamous cell carcinoma (HNSC), and uterine corpus endometrial carcinoma (UCEC) were retrieved from The Cancer Genome Atlas (TCGA) database (http://cancergenome.nih.gov). Influenza infection data (GEO: GSE30550),epithelial-mesenchymal transition (EMT) from the A549 cancer cell line (GEO: GSE147405), human embryonic stem cell to definitive endoderm cells (hESC-to-DEC; GEO: GSE75748), and embryonic differentiation of mouse embryonic fibroblast to neurons (MEF-to-neuron; GEO: GSE67310) were obtained from the Gene Expression Omnibus (GEO) repository (http://www.ncbi.nlm.nih.gov/geo/). The source code for the algorithm can be accessed at https://github.com/Chengmushe/SNE_project.
References
Liu, X. et al. Detecting early-warning signals of type 1 diabetes and its leading biomolecular networks by dynamical network biomarkers. BMC Med Genomics 6, S8 (2013).
Chen, L. et al. Detecting early-warning signals for sudden deterioration of complex diseases by dynamical network biomarkers. Sci. Rep. 2, 342 (2012).
Liu, R. et al. Hunt for the tipping point during endocrine resistance process in breast cancer by dynamic network biomarkers. J. Mol. Cell Biol. 11, 649–664 (2019).
Gopalakrishnan, E. A. et al. Early-warning signals for critical transitions in a thermoacoustic system. Sci. Rep. 6, 35310 (2016).
Teschendorff, A. et al. Statistical mechanics meets single-cell biology. Nat. Rev. Genet. 22, 459–476 (2021).
Southall, E. et al. Early warning signals of infectious disease transitions: A review. J. R. Soc. Interface 18, 20210555 (2021).
Hanahan, D. et al. Hallmarks of cancer: The next generation. Cell 144, 646–674 (2011).
Vogelstein, B. et al. Cancer genome landscapes. Science 339, 1546–1558 (2013).
Han, C. et al. Development of a dynamic network biomarkers method and its application for detecting the tipping point of prior disease development. Comput Struct. Biotechnol. J. 20, 1189–1197 (2022).
McShane, L. et al. Criteria for the use of omics-based predictors in clinical trials. Nature 502, 317–320 (2013).
Poste, G. et al. Bring on the biomarkers. Nature 469, 156–157 (2011).
Aihara, K. et al. Dynamical network biomarkers: Theory and applications. Gene 808, 145997 (2022).
Li, M. et al. Detecting tissue-specific early warning signals for complex diseases based on dynamical network biomarkers: study of type 2 diabetes by cross-tissue analysis. Brief. Bioinform 15, 229–243 (2014).
Liu, J. et al. Identifying the critical states and dynamic network biomarkers of cancers based on network entropy. J. Transl. Med 20, 254 (2022).
Scheffer, M. et al. Early warning signals of critical transitions. Nature 461, 53–59 (2009).
Li, P. et al. Revealing the critical state and identifying individualized dynamic network biomarker for type 2 diabetes through advanced analysis methods on individual basis. Sci. Rep. 15, 3925 (2025).
Li, L. et al. Dynamic network biomarker factors orchestrate cell-fate determination at tipping points during hESC differentiation. Innov. (Camb.) 4, 100364 (2022).
Sun, R. et al. scGRN-Entropy: Inferring cell differentiation trajectories using single-cell data and gene regulation network-based transfer entropy. PLoS Comput Biol. 20, e1012638 (2024).
Wu, X. et al. Network biomarkers, interaction networks and dynamical network biomarkers in respiratory diseases. Clin. Transl. Med 3, 16 (2014).
Ye, Y. et al. Using single-cell entropy to describe the dynamics of reprogramming and differentiation of induced pluripotent stem cells. Int. J. Mod. Phys. B 34, 2050288 (2020).
Zandie, R. et al. Cellular development follows the path of minimum action. Preprint at arXiv:2504.08096 (2025).
Ludwig, J. et al. Biomarkers in cancer staging, prognosis and treatment selection. Nat. Rev. Cancer 5, 845–856 (2005).
Suzuki, K. et al. Genetic drivers of heterogeneity in type 2 diabetes pathophysiology. Nature 627, 347–357 (2024).
Xie, Y. et al. MIWE: Detecting the critical states of complex biological systems by the mutual information weighted entropy. BMC Bioinforma. 25, 44 (2024).
Lyu, C. et al. Detecting tipping points of complex diseases by network information entropy. Brief. Bioinform 25, bbae311 (2024).
Yan, J. et al. Disease prediction by network information gain on a single sample basis. Fundam. Res 1, 215–227 (2023).
Zhong, J. et al. The single-sample network module biomarkers (sNMB) method reveals the pre-deterioration stage of disease progression. J. Mol. Cell Biol. 14, mjac052 (2022).
Scarpino, S. et al. On the predictability of infectious disease outbreaks. Nat. Commun. 10, 898 (2019).
Dong, X. et al. LncRNA miR205HG hinders HNRNPA0 translation: Anti-oncogenic effects in esophageal carcinoma. Mol. Oncol. 16, 795–812 (2022).
Hruda, M. et al. New staging of endometrial carcinoma – FIGO 2023. Ceska Gynekol. 89, 120–127 (2024).
Liu, J. et al. Development and clinical validation of novel 8-gene prognostic signature associated with the proportion of regulatory T cells by weighted gene co-expression network analysis in uterine corpus endometrial carcinoma. Front Immunol. 12, 788431 (2021).
Zhong, J. et al. Identifying critical state of complex diseases by single-sample Kullback–Leibler divergence. BMC Genomics 21, 1–15 (2020).
Ren, J. et al. CPMI: Comprehensive neighborhood-based perturbed mutual information for identifying critical states of complex biological processes. BMC Bioinforma. 25, 215 (2024).
Shen, J. et al. Targeting the p53 signaling pathway in cancers: Molecular mechanisms and clinical studies. MedComm 4, e288 (2023).
Propper, D. et al. Harnessing cytokines and chemokines for cancer therapy. Nat. Rev. Clin. Oncol. 19, 237–253 (2022).
Xue, C. et al. Evolving cognition of the JAK-STAT signaling pathway: Autoimmune disorders and cancer. Signal Transduct. Target Ther. 8, 204 (2023).
Wang, Y. et al. Deciphering JAK/STAT signaling pathway: A multifaceted approach to tumorigenesis, progression and therapeutic interventions. Int Immunopharmacol. 131, 111846 (2024).
Chu, L. et al. Single-cell RNA-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm. Genome Biol. 17, 173 (2016).
Treutlein, B. et al. Dissecting direct reprogramming from fibroblast to neuron using single-cell RNA-seq. Nature 534, 391–395 (2016).
Koizumi, K. et al. Identifying pre-disease signals before metabolic syndrome in mice by dynamical network biomarkers. Sci. Rep. 9, 8767 (2019).
Peng, X. et al. Specific network information gain for detecting the critical state of colorectal cancer based on gut microbiome. Brief. Bioinform 25, bbad465 (2024).
Rochon, J. et al. A closer look at the effect of preliminary goodness-of-fit testing for normality for the one-sample t-test. Br. J. Math. Stat. Psychol. 64, 410–426 (2011).
Yu, G. et al. clusterProfiler: An R package for comparing biological themes among gene clusters. OMICS. 16, 284–287 (2012).
Ito, Y. et al. High level of tumour protein p53-induced nuclear protein 1 (TP53INP1) expression in anaplastic carcinoma of the thyroid. Pathology 38, 545–547 (2006).
Muller, A. et al. Retinoic acid and N-(4-hydroxy-phenyl) retinamide suppress growth of esophageal squamous carcinoma cell lines. Cancer Lett. 113, 95–101 (1997).
Denisova, E. et al. Frequent DPH3 promoter mutations in skin cancers. Oncotarget 6, 35922–35930 (2015).
Zhang, L. et al. Targeting SMARCAL1 as a novel strategy for cancer therapy. Biochem Biophys. Res Commun. 427, 232–235 (2012).
Gharaibeh, L. et al. Notch1 in cancer therapy: Possible clinical implications and challenges. Mol. Pharm. 98, 559–576 (2020).
Song, K. et al. API5 confers cancer stem cell-like properties through the FGF2-NANOG axis. Oncogenesis 6, e285 (2017).
Acknowledgements
This research was funded by the National Natural Science Foundation of China (12401630), the Educational Commission of Guangdong Province (2023KQNCX073), and the Natural Science Foundation of Guangdong Province (2023A1515110558). We appreciate Prof. Rui Liu for his insightful suggestions regarding the DNB theory and critical points analysis.
Author information
Authors and Affiliations
Contributions
J.Z., Z.L., and D.D. conceived the conception of this study. C.S., Z.T., and Y.T. conducted the numerical simulations, analyzed the real data, and carried out the functional analysis. All authors contributed to writing the paper and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
She, C., Tang, Z., Tao, Y. et al. Detection of pre-transition phases during biological development using single-sample network entropy (SNE). npj Syst Biol Appl 11, 141 (2025). https://doi.org/10.1038/s41540-025-00623-w
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41540-025-00623-w








