Use of consensus clustering to identify distinct subtypes of chronic kidney disease and associated mortality risk

Qin, Yi; Xuan, Liping; Wu, Zhe; Deng, Yujie; Liu, Bin; Wang, Shujie

doi:10.1038/s41598-024-81208-1

Download PDF

Article
Open access
Published: 02 December 2024

Use of consensus clustering to identify distinct subtypes of chronic kidney disease and associated mortality risk

Yi Qin¹^na1,
Liping Xuan²^na1,
Zhe Wu¹,
Yujie Deng⁴,
Bin Liu⁵ &
…
Shujie Wang^3,4

Scientific Reports volume 14, Article number: 29893 (2024) Cite this article

4413 Accesses
1 Citations
12 Altmetric
Metrics details

Subjects

Abstract

Background

Chronic kidney disease (CKD) is a complex condition with diverse etiology and outcomes. Utilizing a data-driven clustering approach holds promise in identifying distinct CKD subgroups associated with specific risk profiles for death.

Methods

Unsupervised consensus clustering was utilized to classify chronic kidney disease (CKD) into subtypes based on 45 baseline characteristics in a cohort of 6,526 participants from the US National Health and Nutrition Examination Survey (NHANES) spanning the years 1999–2000 to 2017–2018.We examined the associations between CKD subgroups and clinical endpoints related to mortality, including all-cause mortality, cardiovascular disease mortality, cancer mortality, and mortality due to other causes.

Results

A total of 6,526 individuals with CKD were classified into four clusters at baseline. Cluster 1 (n = 508) comprised patients with relatively favorable levels of cardiac and kidney function markers, lower prevalence of cancer and higher prevalence of obesity, lower medication usage, and younger age. Cluster 4 (n = 2,029) comprised patients with the worst cardiac and kidney function markers. The characteristics of cluster 2 (n = 1,439) and 3 (n = 2,550) fell in between these two clusters. From cluster 1 to cluster 4, we observed a gradual increase in the hazard ratios of all-cause mortality, cardiovascular disease mortality, and mortality due to other causes. Additionally, further sensitivity analysis revealed patient heterogeneity among predefined subgroups with similar baseline kidney function and mortality risks.

Conclusions

Consensus clustering integrated baseline clinical and laboratory measures, revealing distinct CKD subgroups with markedly different risks of death, suggesting that further examination of patient subgroups could advance precision medicine.

Mortality among individuals with chronic kidney disease based on the 2012 and 2021 KDIGO blood pressure targets

Article Open access 18 April 2025

The associations of calf circumference and cardiovascular and all-cause mortality among population with chronic kidney disease stages 3–5

Article Open access 23 July 2025

A clustering approach to improve our understanding of the genetic and phenotypic complexity of chronic kidney disease

Article Open access 26 April 2024

Background

Chronic kidney disease (CKD) involves kidney damage or a reduction in function, typically defined by an eGFR less than 60 mL/min per 1·73 m² or markers of kidney damage like albuminuria¹. CKD poses a significant and escalating global burden, with an estimated 10% of adults worldwide affected by this condition. CKD is independently associated with an elevated risk of all-cause mortality, cardiovascular mortality, and progression to end-stage renal disease^2,3,4. It leads to 1.2 million deaths and 28.0 million years of life lost annually^5,6.

CKD is a complex disorder with a wide range of causes, including systemic illnesses and conditions like hypertension, diabetes, autoimmune diseases, genetic predisposition, and congenital abnormalities^7,8,9,10. Other factors, such as inflammation, exposure to toxins in the environment, and certain medications, can also contribute to the development of CKD^{11,12,13,14,15}. The heterogeneity and complexity of CKD pose challenges to its control and management. The dilemma between health benefits from the intervention and economic consideration in terms of cost-effectiveness calls for re-classification of CKD to enable precise and effective intervention in those at the greatest risk of mortality. In the present study, we hypothesize that distinct subpopulations within individuals with CKD can be identified through the use of multidimensional phenotypic data by performing the consensus clustering analysis^16,17. It is further hypothesized that these subpopulations will have varying risks of future death. We aimed to explore the heterogeneity among CKD patients, and further investigated the differences in mortality risks between clusters.

Results

Study population

Totally, 6,526 individuals with CKD were included in the final analysis. The mean age of study population was 60.3 years, 50.5% of the subjects were women and 58.0% were non-white. The mean eGFR was 95.1 ± 24.2 ml/min per 1.73 m² and the median UACR was 72.5 mg/g (interquartile range, 42.0–181.0 mg/g). The total person-months of follow-up were 591,634 person-months.

Determination of cluster number

Figure 1 shows the matrix heatmaps of the pairwise consensus for each number of cluster analysis (Fig. 1. a-f). The cumulative distribution functions (CDFs) (Fig. 2a) and the proportion increase of the area under the CDFs (Fig. 2b) indicated that to category patients into 4 clusters could best display the profiles of participants with CKD. For 4 clusters, the mean consensus scores were larger than 0.96 for all clusters, with a larger number suggesting better stability of cluster analysis (Fig. 2c).

Distributions of characteristics by clusters

Demographic, anthropometric, behavioral and clinical data for the four clusters are shown in Table 1. The distribution of the 45 baseline variables exhibited statistically significant differences between the four clusters, with the exception of the following variables: not being a citizen of the US, levels of fasting plasma glucose, aspartate transaminase, γ-glutamyl transferase, triglyceride, and uric acid; history of diagnosed diabetes, hypercholesterolemia, congestive heart failure, and treated diabetes (Table 1). The four clusters showed distinctive patterns displayed by standardized means of cluster variables (Fig. 3). Supplementary Table 2 showed the pairwise comparisons of the clustering variables between clusters, and most differences achieved Bonferroni adjusted statistical significance (p < 0.0083). Supplementary Table 3 shows the standard mean differences in age and metabolic risk factors between each two clusters. The distributions of the metabolic-related factors that with large differences (standard mean differences > 0.2) are shown in Fig. 4, and the values of the features were centered to a mean value of 0 and a standard deviation of 1. Cluster 1, including 508 (7.8%) participants, was marked by relatively higher levels of BMI, WC, eGFR, ALT, and DBP, but lower levels of 2 h PG and SBP, and younger age. Cluster 2 comprised 1,439 (22.1%) participants and cluster 3 constituted 2,550 (39.1%) participants. The characteristics of cluster 2 and 3 fell somewhere in between cluster 1 and 4. Cluster 4, included 2,029 (31.1%) participants, was marked by the worst metabolic traits (higher levels of SBP, 2 h PG) and kidney function markers (lowest levels of eGFR).

Table 1 Baseline characteristics of study participants by cluster.

Full size table

Associations of CKD clusters with mortality

Totally, 2,327 participants with CKD at baseline died during the follow-up, including 850 CVD death, 368 cancer death, and 1109 death due to other causes. The Kaplan-Meier survival curves showed that there were significant different risks of all-cause mortality, CVD mortality, and other cause mortality by cluster (log-rank test all P < 0.001) (Fig. 5). However, there was no significant difference in cancer mortality between clusters (P = 0.310) (Fig. 5). The risks of all-cause mortality were significantly higher in cluster 2 (HR, 1.25; 95%CI, 1.02–1.54) and cluster 3 (HR, 1.25; 95%CI, 1.03–1.52), and in cluster 4 (HR, 1.48; 95%CI, 1.22–1.81) compared with cluster 1 (Table 2). Only cluster 4 presented significantly higher risk of CVD mortality (HR, 1.45; 95%CI, 1.05–2.02) and mortality due to other causes (HR, 1.60; 95%CI, 1.20–2.14) compared with cluster 1 (Table 2).

Table 2 Comparison of mortalities between clusters by cox proportional hazards model.

Full size table

Sensitivity analysis

We further performed the sensitivity analysis among the 1,207 individuals with impaired baseline kidney function (eGFR < 45 ml/min per 1.73 m² or UACR ≥ 300 mg/g), we also identified four clusters (Supplemental Fig. 2) with similar cluster consensus scores and proportion of clustered values (Supplemental Fig. 3) as the main analyses. The baseline characteristics of the four predefined clusters are presented in Supplementary Table 4. Similar distributions of characteristics were observed across the predefined clusters. Cluster 1, 2, 3, and 4 consisted of 129 (10.7%), 309 (25.7%), 427 (35.3%), and 342 (28.3%) participants, respectively. Cluster 1 was characterized by relatively higher levels of BMI, WC, eGFR, ALT, DBP, and sodium intake, as well as a younger age. Conversely, cluster 4 was marked by the highest levels of SBP and the lowest levels of eGFR, BMI, and WC. Moreover, participants in cluster 4 exhibited a higher prevalence of stroke, heart attack, and cancer or malignancy, as well as elder age, lower educational attainment, and the lowest sodium intake (Supplementary Table 4). The mortality risks were compared between each cluster. Even though the HRs for all-cause mortality presented increased risk trend by cluster, they did not reach statistical significance. Only cluster 4 had significantly higher risk of mortality due to other causes compared with cluster 1 (HR, 1.65; 95%CI, 1.03–1.81) (Supplementary Table 5).

Another sensitivity analysis was conducted by conducting cluster analysis after excluding highly correlated variables (Supplementary Figs. 4 and 5). Six variables were excluded and 39 variables remained, the results of cluster analysis showed almost the same with the main analysis (Supplementary Table 6).

Discussion

Using unsupervised consensus clustering, we identified four distinct clusters based on 45 baseline variables. These clusters exhibited significant differences in the risk of mortality. Cluster 4, characterized by older age, higher levels of SBP, 2 h PG, lower levels of eGFR, and higher prevalence of CVD, had the highest risk of future mortality. Stratification by eGFR and UACR confirmed patient heterogeneity within the overall population. An understanding of which CKD phenotypes relate to mortality will guide both mechanistic research and promote precision medicine for CKD management.

In this study, the participants exhibited a high prevalence of obesity, as indicated by a BMI of ≥ 30 kg/m² and a WC of ≥ 103 cm. Cluster 1 has the highest levels of BMI and WC, whereas Cluster 4 shows the opposite pattern in our study. Existing evidence suggests a positive association between general adiposity (represented by BMI) and central adiposity (represented by WC) with the risk of all-cause mortality in the general population^18,19. It has been observed that both higher and lower BMI values are associated with an increased risk of mortality, with obesity (BMI ≥ 30 kg/m²) being responsible for the majority of the mortality burden¹⁸. However, interestingly, in our study, cluster 4, which had the lowest levels of BMI and WC, presented the higest with mortality risk. Previous studies have indicated that the association between BMI and mortality risk weakens substantially with increasing age^20,21. Similar findings have been observed for waist circumference in participants aged 60 years and older¹⁹. This suggests that increased weight may actually confer a survival advantage in older individuals²⁰. In our study, the participants were predominantly elderly, with a mean age of ≥ 59 years. This age composition may have attenuated the association between obesity and the risk of all-cause mortality. Overall, our findings highlight the complex relationship between obesity, age, and mortality risk, emphasizing the need for further investigation to better understand the impact of obesity on mortality in different age groups.

The patients in cluster 1 had much higher sodium intake than those in other clusters. The patients in cluster 1 are younger and more likely to be men. Generally, men and younger people have higher energy requirements than women and older people. A correlation between sodium and energy intake is evident. In addition, low sodium intake may suggest a lower nutritional status, which is consistent with our findings that patients in cluster 1 have the lowest mortality risk. Another study also reported a correlation between higher sodium intake and lower all-cause mortality risk in American individuals²².

As we are aware, hypertension is the most common comorbidity observed in patients with CKD²³. In our present study, cluster 4 was characterized by a higher prevalence of hypertension and poorer kidney function. Uncontrolled hypertension in CKD patients can lead to detrimental clinical outcomes, such as myocardial infarction, acute coronary syndrome, ischemic stroke, heart failure, and even death²⁴. Previous research has shown that SBP has a stronger association with adverse kidney outcomes than DBP in CKD patients²⁵. In cluster 4, we observed higher levels of SBP and relatively lower levels of DBP, suggesting that SBP might be a more potent predictor of CKD progression and have a greater impact on future mortality in CKD patients compared to DBP. In conclusion, our findings emphasize the importance of effectively managing blood pressure in patients with CKD.

Our study focused on patients belonging to cluster 4, characterized by poorer kidney function (indicated by lower levels of eGFR) and a higher prevalence of CVD, including coronary heart disease and heart attacks. This particular group demonstrated an elevated risk of future mortality. It is widely acknowledged that eGFR and albuminuria are strongly and independently associated with various adverse outcomes, such as progression to kidney failure, cardiovascular events, and death¹. Our study confirmed these previous findings by establishing a significant association between decreased eGFR and the risk of all-cause mortality. Prior research has consistently demonstrated that CVD is a leading cause of death among patients with CKD¹. In line with these findings, our study revealed that higher-risk groups for cardiovascular events experienced higher overall mortality rates and an increased risk of CVD-related mortality. These results underscore the importance of reducing cardiovascular risk factors to slow down CKD progression and strive to prolong the lifespan of patients with CKD. By prioritizing strategies aimed at managing CVD risk factors in this population, we can potentially improve outcomes, reduce mortality rates, and enhance the overall quality of life for individuals with CKD.

Our study highlights the significant heterogeneity observed within the broad category of CKD, which encompasses various pathologies and etiologies. In order to explore this diversity, we employed consensus clustering using discrete data elements. The use of consensus clustering allowed us to identify distinct patient groups with specific clinical markers. Of particular interest is the stratification of patients based on their kidney function, distinguishing between those with relatively preserved or impaired kidney function. This stratification provides valuable insights for tailoring clinical interventions and treatments in a more precise manner. However, it is important to note that further clinical studies are required to validate the utility of this approach. These studies should investigate whether the identified patient groupings indeed lead to more targeted interventions and improved clinical outcomes, thus fulfilling the promise of precision medicine.

The study has several limitations that need to be considered. Firstly, it’s important to acknowledge that cluster analysis is an exploratory data analysis method. It relies on the input of data, such as the 45 baseline variables used in our analysis, to identify clusters of individuals with similar characteristics. Therefore, if different patient characteristics were used as input variables, it is likely that different CKD subgroups would be identified. It’s essential to interpret the results of cluster analysis within this context and understand that the identified subgroups are specific to the variables used in the analysis. Second, our clustering analysis relied mainly on well-known clinical traits associated with CKD, some other factors may also contribute to identifying sub-types of CKD. Future studies should explore the inclusion of novel biomarker data, such as inflammatory markers, genomics, metabolomics, and proteomics, to enhance our understanding of CKD heterogeneity. Finally, our study population was limited to the US population. Therefore, our findings may not be generalizable to other populations. It is essential to validate our results in a broader range of populations in the future.

By using the probably associated factors of CKD, our study demonstrates that substantial heterogeneity with sophisticated phenotyping and underlying disease pathologies exists within the broad category of “CKD”. Our findings also added new information to the 2012 KDIGO CKD categorization guideline. The underlying heterogeneity in CKD staging by eGFR and urine ACR is not adequately captured. Patients with CKD can be subtyped into four separate subgroups based on 45 baseline features by applying data-driven clustering to multidimensional patient data, including demographics, biomarkers from blood and urine, health status and behaviors, and medication use. These subgroups are associated with future mortality risks. Identification of clinically subgroups among CKD patients provides an important step toward patient classification and management. Further research is needed to validate and refine these clusters, explore their reproducibility, and investigate underlying mechanisms. These findings contribute to a deeper understanding of CKD and have implications for improved patient outcomes.

Methods

Study population

The US National Health and Nutrition Examination Survey (NHANES) is an ongoing cross-sectional, national, stratified, multistage probability surveys of the civilian, noninstitutionalized US population²⁶. About 10 thousand individuals in each survey for every 2 years are investigated to complete a household interview and underwent a physical examination. In present study, we used data from NHANES III 1999–2000 to 2017–2018 which included 10 cycles of survey. All nonpregnant participants with 20 years or older and CKD were included in the analysis. A detailed description of the NHANES database is publicly available (http://www.cdc.gov/nchs/nhanes.htm).

We combined ten consecutive survey cycles which included 101,316 participants. Participants were excluded for aged younger than 20 (n = 46,235), being pregnant at examination or uncertain of the pregnancy status (n = 2,639), and having received dialysis treatment in the past 12 months (n = 162). Participants were also excluded due to missing data on mortality or outliers of cluster variables. Finally, a total of 6,526 eligible subjects with CKD were included in the analysis (Supplementary Fig. 1). The NHANES protocol was approved by the National Center for Health Statistics (NCHS) Institutional Review Board and written informed consent was obtained. Our study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline²⁷.

Mortality data

NHANES III participant records were linked to mortality data from the National Death Index based on death certificate data (https://www.cdc.gov/nchs/data-linkage/mortality-public.htm). International Classification of Diseases, Tenth Revision codes were used to identify the cause of deaths. Cardiovascular death includes death due to diseases of heart (I00-I09, I11, I13, I20-I51) and cerebrovascular diseases (I60-I69). Cancer death was classified using codes C00-C97. Person-months were calculated in months from the date of interview to date of death or most recent vital status record.

Variables

Sociodemographic characteristics, behavioral risk factors and history of diseases were administered in the survey by trained interviewers using questionnaires. The physical examinations and laboratory tests in NHANES took place in a mobile examination center using standardized protocols and calibrated equipment, and details on the data collection are described on the website (https://wwwn.cdc.gov/nchs/nhanes/analyticguidelines.aspx). We selected 45 clinically available and novel factors from over all variables that were measured at NHANES study baseline. Variables were selected on the basis of literature review for those that are most clinically relevant to CKD^{28,29,30,31,32}. We excluded variables with over 10% missing data or small variability (e.g., binary variable with < 5%). The 45 variables included variables of sociodemographic characteristics (n = 9), behavioral risk factors (n = 5), biomarkers of metabolic status (n = 19), and history of diseases (n = 12).

The sociodemographic characteristics included ethnicity, education level, family income, marriage status, citizenship status, housing, employment, health insurance, and regular health care access. The ethnicity was categorized into non-Hispanic white and non-white. Low education attainment was defined as attaining less than a high school education. The income-to-poverty ratio (annual family income divided by the poverty threshold adjusted for family size and inflation) was used as a measure of income. The low income-to-poverty ratio was defined as less than 100%. The marriage status was dichotomized as currently married and not married. The citizenship status includes two options, citizen by birth or naturalization and not a citizen of the US. For investigating housing status, the participants were asked “Is this home owned, being bought, rented, or occupied by some other arrangement by you or someone else in your family?” Employed status was dichotomized as unemployed and employed, student, or retired. The type of health insurance was also dichotomized as with and without health insurance. The participants were asked “Is there a place that you usually go when you are sick, or do you need advice about your health?” for investigating the health care access.

The behavioral risk factors included currently smoking status, currently drinking status, physical activity level, sleep duration and sodium intake. Current smoking was defined as having smoked at least 100 cigarettes in life and smoking at present. Current alcohol drinking was defined as taking at least 12 times drinks of any type of alcoholic beverage in the last 12 months. Physical activity was estimated using the form of the Global Physical Activity Questionnaire by asking questions about the intensity, duration, and frequency of physical activity. There were different types of physical activity assessment tools used in NHANES 1999–2000 to 2005–2006 and NHANES 2007–2008 to 2017–2018. In NHANES 1999–2000 to 2005–2006, the duration of the physical activity was not ascertained, each physical activity was assigned an intensity value (metabolic equivalent tasks) that represents the ratio of the energy expenditure of the activity to the basal metabolic rate. In NHANES 2007–2008 to 2017–2018, total metabolic equivalent minutes per week were calculated as the measurement of physical activity level for the subjects. A higher level of physical activity was defined as having a higher metabolic equivalent/week than the median levels of the metabolic equivalent/week by investigation cycles. The usual sleep duration at night was investigated and long sleep duration was considered as sleep longer than 8 h. The sodium intake was collected through dietary interview.

The physical examinations and laboratory tests of metabolic biomarkers were collected using standardized protocols and assays, including body mass index (BMI), waist circumference (WC), systolic blood pressure (SBP), diastolic blood pressure (DBP), HbA1c, fasting plasma glucose (FPG), 2 h postprandial glucose (2 h PG), alanine transaminase (ALT), aspartate transaminase (AST), γ-glutamyl transferase (GGT), triglyceride (TG), high-density lipoprotein cholesterol (HDL-cholesterol), total cholesterol, low-density lipoprotein cholesterol (LDL-cholesterol), C-reactive protein (CRP), serum albumin, uric acid (UA), Urinary albumin creatinine ratio (UACR), and eGFR.

The information on currently taking prescribed medicine for treating hypertension, diabetes, and hypercholesterolemia was investigated in the survey. The history of diseases, including congestive heart failure, coronary heart disease, heart attack, stroke, and cancer or malignancy was also collected.

Definition of CKD

The eGFR was calculated using the 2009 chronic kidney disease epidemiology collaboration (CKD-EPI) equation with considering the sex, serum creatinine level and race³³. Albuminuria was calculated using urinary albumin divided by the urinary creatinine based on morning spot urine. CKD was defined as eGFR level < 60 ml/min/1.73 m² or UACR ≥ 30 mg/g. Equations expressed for specified sex and serum creatinine level were showed in the Supplementary Table 1.

Statistical analysis

We employed multiple imputation with arbitrary missing patterns to correct for response bias under the assumption of missing at random, and to maximally utilize existing risk factor data. The continuous variables with skewed distributions were log-transformed to normal distributions in the imputation process. The linear regression method was used to impute missing values for continuous variables, and the logistic regression method for variables having binary or ordinal responses. For each variable with missing data, we used the other variables to impute.

We performed consensus clustering analysis on the participants with chronic kidney disease and the continuous values were centered to a mean value of 0 and a standard deviation of 1. The clustering algorithm is to maintain high cluster consensus while maximizing the number of clusters³⁴. With the prespecified setting a number of clusters K = 2, 3, …, 7, the consensus clustering algorithm generated a random subset that contained 80% of the data records without replacement and repeated 100 times for each number of clusters. For each random subset, the K-means (Euclidean distance-based) algorithm was conducted while each individual was assigned to one of the clusters. The frequencies of any pair of two individuals were calculated after 100 iterations, which were grouped together under each scenario of K and constructed a matrix of participantsˈ pairwise consensus value³⁴. In the consensus matrix, consensus values ranged from 0 (never clustered together) to 1 (always clustered together) were marked by white to bright blue. For each number of cluster analysis, the cluster memberships are marked by colored rectangles. The consensus matrix is ordered by the consensus clustering which is displayed as a dendrogram atop the heatmap.

The optimal number of clusters was ascertained by observing the consensus matrix heat map, the within-cluster consensus scores, and the cumulative distribution function (CDF) (range 0–1) plot³⁴. The CDF plot showed the area under the CDFs for each K, and for a specific number of clusters, the CDF reached an approximate maximum, thus consensus and cluster confidence was at a maximum at this K. The relative change in area under the CDF curve comparing K and K − 1 was also used to determine the optimal number of clusters. The cluster consensus score, ranged between 0 and 1, was defined as the average consensus value for all pairs of individuals belonging to the same cluster. A value approached to 1 indicated better cluster stability³⁴.

For continuous variables with normal distribution, we calculated mean and standard deviation; for continuous variables with skewed distributions, we calculated median and interquartile range; and for categorical variables, we presented count and percentage. To present the cluster profiles of the 45 variables, we graphically displayed the standardized means of continuous variables (metabolic risk factors and sodium intake) and proportions for categorical variables (the other variables) by cluster. The frequencies of endpoints related to death were calculated as the number of events divided by personmonths of observation censored at the date of event occurrence, death, or follow-up visit, whichever came first. Adjusted Cox proportional hazard models were used and hazard ratios (HRs) with 95% confidence intervals (CIs) were calculated to estimate the risks for all-cause mortality, CVD mortality, cancer mortality and mortality due to other causes by cluster.

All the statistical analysis was conducted using the R version 4.2.3 (http://www.r-project.org). Consensus clustering analysis was done using the ConsensusClusterPlus function (minimum K = 2, maximum K = 7, replication = 100, proportion of random subset = 0.8, Euclidean distance-based K-means algorithm) in the ‘ConsensusClusterPlus’ package in R version 4.2.3 (http://www.r-project.org).

Data availability

All data are fully available on request from the corresponding author.

References

Kalantar-Zadeh, K., Jafar, T. H., Nitsch, D., Neuen, B. L. & Perkovic, V. Chronic kidney disease. Lancet 398 (10302), 786–802 (2021).
Article CAS PubMed Google Scholar
Astor, B. C. et al. Lower estimated glomerular filtration rate and higher albuminuria are associated with mortality and end-stage renal disease. A collaborative meta-analysis of kidney disease population cohorts. Kidney Int. 79 (12), 1331–1340 (2011).
Article CAS PubMed Google Scholar
Chronic Kidney Disease Prognosis Consortium;et al et al. Association of estimated glomerular filtration rate and albuminuria with all-cause and cardiovascular mortality in general population cohorts: a collaborative meta-analysis. Lancet 375 (9731), 2073–2081 (2010).
Article Google Scholar
Gansevoort, R. T. et al. Lower estimated GFR and higher albuminuria are associated with adverse kidney outcomes. A collaborative meta-analysis of general and high-risk population cohorts. Kidney Int. 80 (1), 93–104 (2011).
Article CAS PubMed PubMed Central Google Scholar
GBD Chronic Kidney Disease Collaboration. Global, regional, and national burden of chronic kidney disease, 1990–2017: a systematic analysis for the global burden of Disease Study 2017. Lancet 395 (10225), 709–733 (2020).
Article Google Scholar
Xie, Y. et al. Analysis of the Global Burden of Disease study highlights the global, regional, and national trends of chronic kidney disease epidemiology from 1990 to 2016. Kidney Int. 94 (3), 567–581 (2018).
Article PubMed Google Scholar
Tervaert, T. W. et al. Pathologic classification of diabetic nephropathy. J. Am. Soc. Nephrol. 21 (4), 556–563 (2010).
Article PubMed Google Scholar
Cheung, A. K. et al. Effects of intensive BP Control in CKD. J. Am. Soc. Nephrol. 28 (9), 2812–2823 (2017).
Article MathSciNet PubMed PubMed Central Google Scholar
Kurts, C., Panzer, U., Anders, H. J. & Rees, A. J. The immune system and kidney disease: basic concepts and clinical implications. Nat. Rev. Immunol. 13 (10), 738–753 (2013).
Article CAS PubMed Google Scholar
Parsa, A. et al. APOL1 risk variants, race, and progression of chronic kidney disease. N Engl. J. Med. 369 (23), 2183–2196 (2013).
Article CAS PubMed PubMed Central Google Scholar
Carrero, J. J. et al. Comparison of nutritional and inflammatory markers in dialysis patients with reduced appetite. Am. J. Clin. Nutr. 85 (3), 695–701 (2007).
Article CAS PubMed Google Scholar
Soderland, P., Lovekar, S., Weiner, D. E., Brooks, D. R. & Kaufman, J. S. Chronic kidney disease associated with environmental toxins and exposures. Adv. Chronic Kidney Dis. 17 (3), 254–264 (2010).
Article PubMed Google Scholar
Wright, J. T. Jr et al. Effect of blood pressure lowering and antihypertensive drug class on progression of hypertensive kidney disease: results from the AASK trial. JAMA 288 (19), 2421–2431 (2002).
Article CAS PubMed Google Scholar
Xie, Y. et al. Proton Pump inhibitors and risk of Incident CKD and Progression to ESRD. J. Am. Soc. Nephrol. 27 (10), 3153–3163 (2016).
Article CAS PubMed PubMed Central Google Scholar
Cacoub, P., Desbois, A. C., Isnard-Bagnis, C., Rocatello, D. & Ferri, C. Hepatitis C virus infection and chronic kidney disease: time for reappraisal. J. Hepatol. 65 (1 Suppl), S82–S94 (2016).
Article PubMed Google Scholar
Soria, D. et al. A methodology to identify consensus classes from clustering algorithms applied to immunohistochemical data from breast cancer patients. Comput. Biol. Med. 40 (3), 318–330 (2010).
Article CAS PubMed Google Scholar
Zheng, R. et al. Data-driven subgroups of prediabetes and the associations with outcomes in Chinese adults. Cell. Rep. Med. 4 (3), 100958 (2023).
Article CAS PubMed PubMed Central Google Scholar
Bhaskaran, K., Dos-Santos-Silva, I., Leon, D. A., Douglas, I. J. & Smeeth, L. Association of BMI with overall and cause-specific mortality: a population-based cohort study of 3·6 million adults in the UK. Lancet Diabetes Endocrinol. 6 (12), 944–953 (2018).
Article PubMed PubMed Central Google Scholar
Jayedi, A., Soltani, S., Zargar, M. S., Khan, T. A. & Shab-Bidar, S. Central fatness and risk of all cause mortality: systematic review and dose-response meta-analysis of 72 prospective cohort studies. BMJ 370, m3324 (2020).
Article PubMed PubMed Central Google Scholar
Lv, Y. et al. The obesity paradox is mostly driven by decreased noncardiovascular disease mortality in the oldest old in China: a 20-year prospective cohort study. Nat. Aging. 2 (5), 389–396 (2022).
Article PubMed Google Scholar
Ng, T. P. et al. Age-dependent relationships between body mass index and mortality: Singapore longitudinal ageing study. PLoS One. 12 (7), e0180818 (2017).
Article PubMed PubMed Central Google Scholar
Liu, D. et al. Sodium, potassium intake, and all-cause mortality: confusion and new findings. BMC Public. Health. 24, 180 (2024).
Article CAS PubMed PubMed Central Google Scholar
Kim, H. et al. Baseline Cardiovascular characteristics of adult patients with chronic kidney disease from the KoreaN Cohort Study for outcomes in patients with chronic kidney Disease (KNOW-CKD). J. Korean Med. Sci. 32 (2), 231–239 (2017).
Article PubMed Google Scholar
James, P. A. et al. 2014 evidence-based guideline for the management of high blood pressure in adults: report from the panel members appointed to the Eighth Joint National Committee (JNC 8). JAMA 311 (5), 507–520 (2014).
Article ADS CAS PubMed Google Scholar
Lee, J. Y. et al. Association of blood pressure with the progression of CKD: findings from KNOW-CKD Study. Am. J. Kidney Dis. 78 (2), 236–245 (2021).
Article PubMed Google Scholar
Curtin, L. R. et al. The National Health and Nutrition Examination Survey: Sample Design, 1999–2006. Vital Health Stat. 2 ;(155):1–39. (2012).
von Elm, E. et al. The strengthening the reporting of Observational studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Ann. Intern. Med. 147 (8), 573–577 (2007).
Article Google Scholar
Kronenberg, F. Emerging risk factors and markers of chronic kidney disease progression. Nat. Rev. Nephrol. 5 (12), 677–689 (2009).
Article CAS PubMed Google Scholar
Kao, H. Y. et al. Associations between sex and risk factors for Predicting chronic kidney disease. Int. J. Environ. Res. Public. Health. 19 (3), 1219 (2022).
Article CAS PubMed PubMed Central Google Scholar
Geylis, M., Coreanu, T., Novack, V. & Landau, D. Risk factors for childhood chronic kidney disease: a population-based study. Pediatr. Nephrol. 38 (5), 1569–1576 (2023).
Article PubMed Google Scholar
Kareem, S. et al. Epidemiology and risk factors of chronic kidney Disease in Rural areas 4 (Badin) of Sind, Pakistan. J. Pak Med. Assoc. 73 (7), 1399–1402 (2023).
Article PubMed Google Scholar
Xie, Y. & Chen, X. Epidemiology, major outcomes, risk factors, prevention and management of chronic kidney disease in China. Am. J. Nephrol. 28 (1), 1–7 (2008).
Article MathSciNet PubMed Google Scholar
Stevens, P. E., Levin, A. & Kidney Disease: Improving Global Outcomes Chronic Kidney Disease Guideline Development Work Group Members. Evaluation and management of chronic kidney disease: synopsis of the kidney disease: improving global outcomes 2012 clinical practice guideline. Ann. Intern. Med. 158 (11), 825–830 (2013).
Article PubMed Google Scholar
Monti, S., Tamayo, P., Mesirov, J. & Golub, T. Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach. Learn. 52, 91–118 (2003).
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank all the participants of the NHANES. We thank the individual patients who provided the sample that made data available; without them the study would not have been possible.

Funding

This work was supported by grants from the Youth Program for Shandong Province Science and Technology of Traditional Chinese Medicine (No. Q-2022117); the National Nature Science Foundation of China (No. 82200968); the Provincial Nature Science Foundation of Shandong (No. ZR2022QH295).

Author information

Yi Qin and Liping Xuan contributed equally to this work.

Authors and Affiliations

Department of Thoracic Surgery, The Affiliated Hospital of Qingdao University, Qingdao, China
Yi Qin & Zhe Wu
Department of Endocrinology, Guangdong Provincial Geriatrics Institute, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
Liping Xuan
Department of Geriatric Medicine, The Affiliated Hospital of Qingdao University, Jiangsu Road No.19, Qingdao, China
Shujie Wang
Department of Endocrinology and Metabolism, The Affiliated Hospital of Qingdao University, Qingdao, China
Yujie Deng & Shujie Wang
Department of Rheumatology, The Affiliated Hospital of Qingdao University, Qingdao, China
Bin Liu

Authors

Yi Qin
View author publications
Search author on:PubMed Google Scholar
Liping Xuan
View author publications
Search author on:PubMed Google Scholar
Zhe Wu
View author publications
Search author on:PubMed Google Scholar
Yujie Deng
View author publications
Search author on:PubMed Google Scholar
Bin Liu
View author publications
Search author on:PubMed Google Scholar
Shujie Wang
View author publications
Search author on:PubMed Google Scholar

Contributions

SW designed and supervised the study and acted as the corresponding author and guarantor. YQ and LX served as first authors and contributed to the study implementation, data analysis, and manuscript preparation. ZW, YD and BL contributed to result interpretation, revisions, and finalizing the manuscript. All authors reviewed and approved the manuscript’s final version.

Corresponding author

Correspondence to Shujie Wang.

Ethics declarations

Ethics approval and consent to participate

The NHANES is a public-use dataset available through the website. The NHANES protocol was approved by the institutional review board of the Centers for Disease Control and Prevention (https://www.cdc.gov/nchs/nhanes/ irba98. htm). NHANES has obtained written informed consent from all participants.

Competing interests

The authors declare no competing interests.

Compliance with ethics guidelines

All authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Qin, Y., Xuan, L., Wu, Z. et al. Use of consensus clustering to identify distinct subtypes of chronic kidney disease and associated mortality risk. Sci Rep 14, 29893 (2024). https://doi.org/10.1038/s41598-024-81208-1

Download citation

Received: 08 January 2024
Accepted: 25 November 2024
Published: 02 December 2024
Version of record: 02 December 2024
DOI: https://doi.org/10.1038/s41598-024-81208-1

Subjects

Abstract

Background

Methods

Results

Conclusions

Similar content being viewed by others

Mortality among individuals with chronic kidney disease based on the 2012 and 2021 KDIGO blood pressure targets

The associations of calf circumference and cardiovascular and all-cause mortality among population with chronic kidney disease stages 3–5

A clustering approach to improve our understanding of the genetic and phenotypic complexity of chronic kidney disease

Background

Results

Study population

Determination of cluster number

Distributions of characteristics by clusters

Associations of CKD clusters with mortality

Sensitivity analysis

Discussion

Methods

Study population

Mortality data

Variables

Definition of CKD

Statistical analysis

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Competing interests

Compliance with ethics guidelines

Additional information

Publisher’s note

Electronic supplementary material

Supplementary Material 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links