Introduction

As China’s population rapidly ages, the prevention and treatment of multiple chronic diseases among the elderly have become major public health challenges. Around 180 million elderly people in China suffer from multiple chronic conditions, with a comorbidity rate of 65.14%1,2. This issue significantly impacts their health and quality of life, making research on chronic disease comorbidities and their formation crucial for effective prevention and treatment strategies.

Existing research on multiple chronic diseases focuses on two areas: relationship research based on multimorbidity and the comorbidity of specific diseases. Multimorbidity refers to an individual having multiple diseases with no priority, while comorbidity involves one disease influenced by others3.

Some studies defined multiple chronic conditions as multimorbidity, focusing on the coexistence relationship between comorbid diseases, treating disease status as an established outcome. Multimorbidity reflects this established relationship, leading many studies to use methods like association rules. Some researchers applied the Apriori algorithm to extract frequent item sets and build association rule models for chronic diseases4,5. Li Xiaofeng et al. combined self-organizing map neural networks with K-means clustering to identify common comorbidity patterns in the elderly6. Some studies had utilized complex networks to analyze connections between chronic diseases, treating individual diseases as nodes and employing methods like topological structure analysis and community division to identify comorbidity patterns. For example, Yang Qingqing et al. used the Louvain method to categorize the comorbidity network into communities and compared the types, combinations, and prevalence of comorbidities across genders7. Li Yanna et al. constructed an IHD comorbidity network and studied its characteristics and patterns using topological properties, centrality, and community detection8. Zhang Qing et al. constructed a comorbidity network using diagnostic information from diabetic patients’ electronic medical records, identifying comorbidity patterns9. Bian Zekun et al. analyzed the central and bridge strength of anxiety and depression symptoms in the elderly, offering insights for intervention and treatment10.

All the above studies share a key limitation: they focus on static connections between individual chronic diseases, overlooking the dynamic formation processes of transitions between comorbid states. These studies primarily address disease outcomes and reveal comorbidity patterns and risks based on individual outcomes, without capturing the specific transition paths between disease states. For instance, in network analysis, the link between diabetes and hypertension is seen as multimorbidity, but it fails to account for indirect pathways, such as diabetes leading to hypertension via dyslipidemia.

Comorbidity highlights the presence of multiple chronic diseases linked to a specific condition, often developing sequentially. One chronic disease frequently leads to others11. Uddin et al. examined which diseases are more likely to cause comorbidities and identified associated risk factors11, while Russell et al. explored the causal link between comorbidities and COVID-19 outcomes12. These studies focus on the development of comorbidities but do not address whether a steady-state distribution exists across all chronic disease systems or if key disease states influence the comorbidity formation.

In view of the shortcomings of the existing literature, this study raises the following questions:

  1. (1)

    Is there a typical distribution of chronic disease comorbidities in the elderly? If so, is the distribution stable?

  2. (2)

    Are there key chronic disease states in the development of chronic disease comorbidity? If so, to what extent do these critical states contribute to chronic comorbidity?

Using dependency network and data from the 2015 and 2018 China Health and Retirement Longitudinal Study (CHARLS), we analyzed transitions between comorbidity states. Network topology and centrality analysis revealed important comorbidity nodes, offering theoretical support for controlling morbidity risks. Due to the complexity of the chronic disease system, understanding the transition and key disease states can enhance elderly public health services, improving quality of life, health outcomes, and reducing morbidity, disability, and mortality rates.

This study contributes to the literature in three key areas: first, we used tracking data to construct a dependency network of chronic diseases in the elderly, which allowed for better identification of comorbidity formation; second, we identified a stable, power-law distribution of comorbidity types, that is, the distribution of comorbidities remained basically unchanged regardless of the total number of comorbidity populations; and third, we uncovered important intermediate disease states and stable comorbidities types in the comorbidity process. These findings enhance our understanding of comorbidity formation in the elderly and offer valuable insights for its prevention and treatment.

The paper is organized as follows: Section “Methods” covers data and methods, Section “Results” presents the construction and analysis of the chronic disease comorbidity network, Section “Discussion” discusses the comorbidity distribution characteristics via simulation, and Section “Conclusion” concludes the study.

Methods

Data processing

We utilized tracking data from CHARLS for the years 2015 and 2018. In 2015, we had 20,042 samples, which included 9,585 males and 10,457 females. In 2018, the sample size decreased slightly to 19,435, with 9,205 males and 10,230 females. After excluding records with missing personal, family, or community identification numbers, as well as year of birth data, we obtained 15,299 samples that were traceable across both 2015 and 2018. Of these, 7,417 were male, and 7,882 were female.

Dependency network

A relational network is a structure where the state of a node depends on other nodes, represented by a “dependency graph” in graph theory. In such a network, nodes typically represent program results or tasks, and edges indicate data transmission or dependencies. In UML (Unified Modeling Language) modeling, a dependency shows that a change in class A affects class B, meaning B depends on A. For example, traveling from city A to city C via city B can be represented by nodes A, B, and C, with directed edges A→B and B→C. The relationship B→C relies on A→B, highlighting the importance of connection order and edge weight in dependency-based networks.

The transition from being healthy to having a single chronic disease, and eventually multiple diseases, is a time-dependent, multi-state process with state dependency. Direct transitions from a healthy state to multiple diseases are rare, with most following a path from one chronic disease to several. We aim to use dependency networks to model this process, capturing two key aspects: (1) individuals’ health states change with age, and (2) they eventually stabilize in certain disease states.

In the dependency network, disease status changes were represented by nodes, which reflected multidimensional disease states rather than specific chronic diseases. For n chronic diseases, there are theoretically \(\:{2}^{n}\) possible health states, ranging from completely healthy (n-dimensional 0) to all diseases present (n-dimensional 1). A comorbidity network in this space should have \(\:{2}^{n}\) nodes, each representing a different disease state. Transitions between these states were shown as directed edges, and to capture the sequential dependency, we used time-stamped data. For example, if an individual moved from being healthy in 2015 to having hypertension in 2018, a directed edge from the healthy node to the hypertension-only node showed the transition.

The edge weight in this directed network was replaced by the frequency of state transitions in the sample, reflecting the dependence between two states. This model no longer views chronic disease progression as irreversible but build a relationship network based solely on state transitions in the tracking dataset, generating directed edges for reversible processes as well.

Centrality measures

Degree centrality measures a node’s connections, closeness centrality its ability to disseminate information, betweenness centrality its role in information transmission, and eigenvector centrality its overall importance.

Degree centrality \(\:d(\nu)\) represents the number of direct connections between a disease and others. Diseases with high degree centrality are more interconnected. Degree centrality is split into in-degree \(\:{d}_{in}\left(\nu\:\right)\) and out-degree \(\:{d}_{out}\left(\nu\:\right)\). High in-degree indicates a stable disease-absorbing state, while high out-degree suggests an unstable state with a higher likelihood of transitioning to other diseases. We also consider edge weights by calculating the weighted in/out degree centrality.

Closeness centrality \(\:C\left(\nu\:\right)\) is the reciprocal of the sum of the shortest path lengths from a node to all others. A high value indicates shorter distances to other nodes, making it easier to connect with other diseases and highlighting its diagnostic importance. Inbound closeness centrality \(\:{C}_{in}\left(\nu\:\right)\) reflects how easily other diseases transition to this state, showing strong integration. Outbound closeness centrality \(\:{C}_{out}\left(\nu\:\right)\) reflects how easily this disease transitions to others, showing strong influence. The calculation also considers edge weights, where higher weights reduce path cost, facilitating transitions.

Betweenness centrality \(\:B\left(\nu\:\right)\) measures how often a node acts as an intermediary in the network’s shortest paths. A disease with high betweenness centrality plays a key role in connecting other diseases. The calculation also accounts for edge weights, using the inverse of the edge weight as the path cost.

Eigenvector centrality measures a node’s influence, where connections to high-scoring nodes contribute more to its score. A high score indicates a node is linked to many influential nodes. This centrality highlights a node’s environment, with higher values indicating proximity to sources of risk and requiring extra precautions.

  1. 1.

    \(\:\:PageRank\left(\nu\:\right)\): Since eigenvector centrality is suited for undirected graphs, many nodes in directed graphs have an eigenvector centrality of 0. PageRank improves on this by adapting eigenvector centrality for directed graphs.

  2. 2.

    \(\:\:Hub\left(\nu\:\right)\) and \(\:Authority\left(\nu\:\right)\): These recursive centralities measure two types of links. A node’s hub score is the sum of its successors’ authority scores, and its authority score is the sum of its predecessors’ hub scores. Both hub and authority scores sum to 1, with the hub score indicating an important sender and the authority score indicating an important receiver.

The in_comprehensive centrality index is expressed as:

$$\:\begin{array}{c}{I}_{1} (\nu\:) =\alpha\:\cdot\:{d}_{in}\left(\nu\:\right)+\beta\:\cdot\:{C}_{in}\left(\nu\:\right)+\gamma\:\cdot\:B\left(\nu\:\right)+\delta\:\cdot\:PageRank\left(\nu\:\right)+\epsilon\:\cdot\:Authority\left(\nu\:\right) \end{array}$$
(1)

The out_comprehensive centrality index is expressed as:

$$\:\begin{array}{c}{I}_{2} (\nu\:) =\alpha\:\cdot\:{d}_{out}\left(\nu\:\right)+\beta\:\cdot\:{C}_{out}\left(\nu\:\right)+\gamma\:\cdot\:B\left(\nu\:\right)+\delta\:\cdot\:PageRank\left(\nu\:\right)+\epsilon\:\cdot\:Hub\left(\nu\:\right) \end{array}$$
(2)

Results

Network construction

We used the 2015 and 2018 tracking data from the CHARLS dataset. A node was generated for each disease state observed in the sample population. Some nodes correspond to states from 2015, some from 2018, and others appear in both years. After generating nodes, we created edges between them. If a sample’s disease state changes over time, a directed edge was formed between the corresponding nodes. The edge’s weight reflected the frequency of this transition across multiple samples.

Based on the network generation mechanism, we created a chronic disease comorbidity state transition network from 2015 to 2018, as illustrated in Fig. 1. This directed weighted network indicates the transition direction between disease states, with edge weights representing the frequency of these transitions.

Fig. 1
figure 1

Relationship network diagram based on tracking samples, showing a transfer network of chronic disease states (including all comorbidities present in all samples) constructed using the edge-generation approach of dependency networks.

Between 2015 and 2018, there were 770 disease states in 2015 and 567 in 2018. After removing duplicates, 1031 nodes (including the fully healthy state) were generated. The network produced 3145 edges, representing distinct disease state transfer paths, each with varying weights. Figure 2 shows the degree and weighted degree distribution of each node.

Fig. 2
figure 2

Schematic diagram of degree distribution and weighted degree distribution, which shows the degree distribution and weight distribution of all nodes in the disease state transfer network from left to right, and the smaller figure in the two figures shows the distribution at different x-interval scales (left: degree distribution, right: weighted degree distribution).

Network scale-free characteristics

The degree and weighted degree distributions of the nodes in the chronic disease comorbidity network show a clear power-law pattern, where a few nodes have large degrees and weights, while most have smaller ones. This suggests the network may be scale-free. To confirm this, we plotted the degree distribution on a double logarithmic scale, which shows a near-linear trend with a long tail, indicating a power-law distribution. Using Python’s power law fitting function and maximum likelihood estimation, we calculated the power exponent and standard deviation. Typically, power-law exponents range from 2 to 3, and the fitted exponents for degree and weighted degree distributions are shown in Table 1.

Table 1 Degree distribution power law fitting test results.

The results confirm that the chronic disease comorbidity network has scale-free properties, indicating it is a complex, nonlinear system. The power-law distribution of degree and weighted degree shows that a few chronic disease states represent most individuals’ transitions, while many states are rare. Scale-free properties bring both robustness and fragility, meaning controlling key nodes in the network can weaken the formation path of certain comorbidities. By targeting Hub nodes that control key transitions path, the spread of risk along these paths can be significantly disrupted.

Network node centrality measurement

Certain intermediate disease states serve as key nodes in chronic disease transitions. In a dependency network, node centrality indicates a node’s influence. We analyzed out-degree and in-degree centrality, which have distinct meanings: out-degree reflects how often a state transitions to others, while in-degree shows how often other states transition into it. Nodes with high in-degree are stable states, while those with high out-degree act as hubs. Nodes with both high in-degree and out-degree centrality play critical transitional roles. In the chronic disease comorbidities transfer network, node centrality does not significantly skew in the five centrality indicators. We used the average of these five centrality indicators to construct comprehensive indicators, that is, the coefficients in formulas 1 and 2 are 0.2.

In-degree centrality represents a node’s ability to absorb comorbidities, while out-degree centrality indicates its role in transferring risk. States with high in-degree centrality include: healthy, only hypertension, only arthritis, only stomach disease, only dyslipidemia, only heart disease, only chronic lung disease, only stroke, only diabetes, only kidney disease, only liver disease, and stomach disease with arthritis, indicating stable absorption points. States with high out-degree centrality, like healthy, only arthritis, only hypertension, only stomach disease, stomach disease with arthritis, only dyslipidemia, hypertension with arthritis, only chronic lung disease, only heart disease, only kidney disease, only diabetes, and hypertension with dyslipidemia, are more unstable, serving as starting points for transitions to other states.

Further analysis shows that disease states with high in-degree and out-degree centrality, such as healthy, only asthma, only arthritis, and combinations like stomach disease with arthritis or chronic lung disease with asthma, serve as key transition points. Individuals frequently move through these states before reaching more stable conditions. States like hypertension with arthritis or dyslipidemia have high out- degree but low in-degree centrality, indicating they are more likely to lead to other diseases. Monitoring and managing these critical transition points can help control comorbidity progression and reduce its prevalence.

Discussion

To further investigate the impact of comorbidity nodes on distribution of comorbidities, we conducted simulation experiments to model real transitions. The initial distribution of samples across disease states was set to match the 2018 disease state distribution in the CHARLS dataset, using an initial sample of 40,920. At each step, each sample had a certain probability of transitioning from its current state to other disease states. The probability of transitioning from node i to node j at time t + 1 was calculated as follows:

$$\:\begin{array}{c}{P}_{ij}=P(i\to\:j)=\frac{W(i\to\:j)}{\sum\:_{k}W(i\to\:k)} \end{array}$$
(3)

Among them, \(\:\sum\:_{k}P(i\to\:k)=1\), \(\:W(i\to\:j)\) represents the frequency of node i to node j. This frequency was also obtained based on the transfer frequency in this round from 2015 to 2018 in the CHARLS dataset.

After setting the total step length to 500 periods and keeping the transition probabilities fixed, the samples reach a stable distribution of disease states, as shown in Fig. 3, which is not changing with further changes in time. Over time, individuals shift from the blue to the red column distribution. The blue line shows the distribution of chronic disease comorbidities at the beginning, where the states that occur more frequently include: only arthritis, only chronic stomach, chronic stomach and arthritis, only hypertension; The red line shows the distribution of comorbidity after 500 stages of simulation, where the most frequent states include: heart disease-stroke, dyslipidemia-stroke-arthritis, dyslipidemia and liver disease, liver disease and stroke, chronic lung disease and stroke.

Fig. 3
figure 3

Schematic diagram of network simulation results, observed the occurrence frequency of disease states (including comorbidities) in the first and last stages of dynamic evolution through intuitive state statistics (blue lines: the start phase; red lines: the stable result). The value of x-axis indicating the code for all disease states, corresponding to the two rounds, there were 1030 diseases in addition to the completely healthy state.

The statistical characteristics of the initial distribution and the final stable distribution representing in Table 2, showing that the most common state shifted from “only arthritis” to “heart disease and stroke.” The most prevalent comorbidity states became combinations involving heart disease, stroke, dyslipidemia, chronic liver disease, and kidney disease. The increasing variability indicates more dispersed and unstable comorbidity patterns.

Table 2 Statistical analysis of network simulation results.

To observe the role of certain nodes in disease state transitions, we removed key nodes and simulated the transfer of samples across various states. By comparing the stable distribution before and after node removal, we identified nodes whether control could effectively block disease state transitions and change the distribution. Based on the weighted averages of in-degree centrality, closeness centrality, betweenness centrality, and eigenvector centrality, we removed the top-ranked nodes: (1) Only suffering from asthma; (2) Only suffering from arthritis; (3) Suffering from stomach disease and arthritis; (4) Suffering from chronic lung disease and asthma; (5) Suffering from hypertension and arthritis; (6) Suffering from stomach disease and asthma.

The power law distribution fitting results for each conditional distribution are shown in Table 3, all following the form\(\:\:y=a\cdot\:{x}^{b}\). The fitting results show that blocking the connection between a single key disease state and the rest of the network does not significantly alter the final stable distribution compared to when no node is removed. While key nodes are important for network pathways, however, the overall network connectivity does not depend on a single node, and transitions between disease states occur through multiple pathways. This indicates that blocking a single disease state cannot effectively change the overall distribution of the population in all comorbidity states.

At the same time, to further examine the effect of removing two, three, and up to six nodes on the final distribution of the network, we found that the number of nodes removed did not significantly change the final distribution type of the chronic disease comorbidity network, which is showed in Fig. 4.

Fig. 4
figure 4

Scatter plot of the resulting distribution for different number of nodes, the final comorbidity distribution of the simulation results was observed when removing one node, two nodes, or even six nodes. The value of x-axis indicating the code for all disease states, corresponding to the two rounds, there were (1030-k) diseases in addition to the completely healthy state, k is the number of removing node.

This indicates that simply controlling a certain key disease state or several hub states along the formation path is insufficient to prevent or reduce ratio of certain comorbidities. It does not significantly alter the probability or distribution of comorbidities. Effective prevention and treatment must address the entire comorbidity formation process, highlighting the systematic and complex nature of the comorbidity network.

Table 3 Results of the post-stable distribution fitting after removing some states.

Conclusion

Chronic diseases are a major health concern for the elderly. The progression of chronic disease comorbidity involves individuals transitioning between different disease states over time. Identifying key disease states and determining the final stable distribution of comorbidities can help communities, individuals, and policymakers manage and prevent risks effectively. By incorporating time into the chronic disease comorbidity network, the interconnections and sequence of disease states are revealed, allowing for the identification of key comorbidity states and pathways in the elderly.

This paper showed that the comorbidity state transition network has significant scale-free properties, highlighting the nonlinear complexity of the chronic disease system. Integrating in-degree centrality reveals key disease states in the comorbidity formation path, such as completely healthy, only suffering from asthma, only arthritis, stomach disease and arthritis, chronic lung disease and asthma, and hypertension with arthritis. These nodes play a crucial role in state transitions. For single individuals, the prevalence of common chronic disease comorbidity can be effectively reduced by controlling the status of comorbidity, that is, reduce the total number of comorbidity groups.

Through simulation of the disease state transition process, the stable distribution of samples in various comorbidity states was determined based on a fixed transition matrix. The most common and stable comorbidities include heart disease and stroke, dyslipidemia and stroke, dyslipidemia and chronic liver disease, chronic liver disease and stroke, stroke and kidney disease, and chronic lung disease and stroke. For public health, controlling a single critical disease state does not change the distribution of chronic disease comorbidities in a disease population. This show that disease risk propagates through multiple pathways, not just one. It highlights the need for interventions targeting the entire comorbidity formation process in the elderly. Additionally, the discovery of multi-path transfers provides a theoretical basis for using higher-order networks to further explore key comorbidity states and risk propagation.

This study’s network does not account for age, gender, or additional rounds of tracking data in the comorbidity formation process. While this simplifies the analysis, considering these factors is crucial for a deeper understanding of chronic disease comorbidity in the elderly and will be a focus of future work.