Abstract
To explore in depth the characteristics of the risk factors for diabetes and prediabetes pathogenesis and progression in special regions. We investigated medical data from 160 thousand cases in the newly developing urban area of a large modern city from 2015 to 2021. After excluding the population with incomplete data, a total of 47,608 people who underwent physical examinations and blood tests were included in this study. A total of 5.0 ± 0.6% of the population aged 41.3 ± 12.6 years had diabetes, and 5.3 ± 2.0% had prediabetes. Risk factor assessment in different states suggested that early risk factors for diabetes pathogenesis were associated with aging, metabolic disorders and obesity, and the consequent risk factors for disease progression were liver, cardiovascular and kidney dysfunction. Our machine learning model was used for disease risk estimation. After the model was trained, the precision and recall rate of the prediction reached 0.76 and 0.86, respectively, with an F1 score of 0. 81. Moreover, there was a greater incidence of diabetes in men than in women (6.68% vs. 2.61%, χ2 = 1415.68, p < 0.001). They all live in the same urban area and have similar age. Diabetes and prediabetes can improve and even reverse to a normal state through a healthy lifestyle. Taken together, the risk factors were independent, but they had synergistic effects on different factors responsible for the pathogenesis and progression of diabetes. Early intervention in health management, especially individual strategies associated with obesity and metabolism, is very helpful for diabetes prevention with increasing age.
Similar content being viewed by others
Introduction
Diabetes mellitus, a metabolic syndrome characterized by high blood glucose, has rapidly become prevalent in recent years and is one of the major noncommunicable diseases affecting human health. As newly reported by the International Diabetes Federation (IDF), 537 million adults (10.5%) were living with diabetes worldwide in 2021. This number might increase to 783 million by 2045 according to the current momentum1. Interestingly, the incidence of diabetes in the urban population is significantly greater than that in rural areas. Moreover, China is at the top of countries or territories on the prevalence list, where 12.8% of adults suffer from diabetes2.
Diabetes can be divided into type 1 diabetes and type 2 diabetes. Most cases of type 2 diabetes are caused by absolute or relative deficiency of insulin and insulin resistance of target receptors3,4. It is not only regarded as an age-leading regressive chronic disease5 but also a rich man’s disease6 because it is a metabolic disease driven by excessive nutrition7,8. Therefore, it is difficult for patients with type 2 diabetes to be completely cured. Ignoring blood sugar monitoring and control can result in various acute and chronic complications, such as ketoacidosis, stroke, kidney disease, eye damage and infection9,10,11,12. Diabetes and its associated complications not only cause long-term damage and economic burden but also high mortality, so diabetes has become a key global health issue.
Currently, drug treatment plays a positive role in controlling blood sugar and decreasing the occurrence of complications4,13. Recently, increasing evidence has shown that diabetes patients are becoming increasingly younger14, and some disease-associated risk factors, including lifestyle patterns, psychological stress, sleep quality, diet and sports, are being considered15,16. In an awful situation, these factors can increase the prevalence of diabetes or worsen the disease. Therefore, controlling these risk factors, especially in the early stage, may help to reduce the incidence or improve the prognosis of diabetes.
Most patients with diabetes have no clinical symptoms in the early stage, which leads to difficulties in its early detection. When diabetes patients have clinical symptoms, clinical complications may occur. Moreover, public health examinations can aid in disease risk assessment by identifying diseases in the early clinical stage and detecting disease risk in the preclinical stage; thus, public health examinations can help reduce the risk of disease onset.
In addition, the prevalence of diabetes varies in different regions, which may be related to the environment, customs and work attributes. China’s economy has experienced sustained development in recent years, and the prevalence of diabetes has increased annually. An epidemiological survey of diabetes among the health examination population in specific regions of China will help to develop personalized disease prevention strategies suitable for local population characteristics14.
To explore the incidence and major risk factors for diabetes in China’s newly developing cities to improve the health management of this disease, we collected health data from the population in a new developing region in Chongqing, a large city in southwest China, over the past six years. By statistical analysis, we revealed the basic features of diabetes risk factors in this region, and using bioinformatics techniques, a machine learning model was established to reoccur the synergistic effect of risk factors, which is helpful for understanding disease pathogenesis and risk assessment. This study provides new insights and suggestions for diabetes prevention and management.
Methods
Data collection and preparation
The data were collected from the population in a new developing region of Chongqing, a large modern city, every two years from 2015 to 2021. These people underwent a series of medical examinations at the outpatient health management center of JinShan Hospital, the First Affiliated Hospital of Chongqing Medical University. Personal information, physical data, and laboratory data, such as routine blood and urine tests, biochemical data, and liver and kidney function test results, were collected. All the samples used in this study had at least 64 principal exam items, including fasting blood sugar.
In addition to several extreme values, we would like to keep the data as original as possible. For instance, we calculated the prevalence using the adult samples with an age more than 20 years old, which is the same as the conventional standard used in previous studies, but we performed a polynomial regression analysis on the whole dataset to observe comprehensive changes.
Statistical analysis
The diagnosis and classification were performed according to the World Health Organization (WHO) and International Diabetes Federation (IDF) criteria17: fasting plasma glucose greater than or equal to 7.0 mmol/L indicates diabetes, and fasting plasma glucose in the range of 6.1 ~ 6.9 mmol/L indicates prediabetes. Moreover, the healthy values were between 3.9 and 6.1 mmol/L, and level 2 hypoglycemia was defined as hypoglycemia less than 3.0 mmol/L according to the classification of hypoglycemia18,19.
The chi-square test was used to evaluate categorical variables, and the correlation analysis of all groups was performed by the Pearson or Spearman method for continuous variables. Both Student’s t test and the Wilcoxon signed-rank test were used for comparisons between pairwise groups, and P < 0.001 was considered to indicate statistical significance.
The python scripts we wrote in this study adopted the relevant toolkits pandas, matplotlib, and seaborn for data cleaning, visualization, kernel density estimation and heatmap analysis of correlations20.
Feature extraction and machine learning model
The features were detected by the distance between the distribution of every variable in different disease statuses. The feature factors were harvested if the variant degree (VD) of the distance was greater than 10 with p < 0.001 according to both Student’s t test and/or the Wilcoxon test.
The dynamic change in the risk level was assessed for each feature factor by using polynomial regression, and the fitness was estimated by the minimum standard error. Therefore, a disease prediction model was established. Moreover, a support vector machine model was used in this study for comparative estimation under tenfold cross-validation. The python scripts we wrote in this model adopted the modules from scikit-learn in python21.
Evaluation methods
The feature-selected dataset was randomly divided into a training set and a test set at a ratio of 7:3, according to the standard process of train_test_split from sklearn.model_selection21.The trained classifier was used to predict the test set data and calculate the accuracy, precision, recall and F1-score. Then, a receiver operating characteristic (ROC) curve was drawn, and the area under the curve (AUC) was calculated to verify the reliability and evaluate the performance of the constructed diagnostic models.
Ethics
This research was approved by the Ethics Committee of the First Affiliated Hospital of Chongqing Medical University (2022-K122). The data collection and usage complied with the principles of medical ethics and the requirements of the Declaration of Helsinki, without any risk or damage to health, security or privacy for the participants. Informed consent was obtained from all the participants prior to the enrollment of this study.
Results
Demographic characteristics and incidence of diabetes and prediabetes
In this study, we investigated the diabetes epidemic in a new developing district in a large modern city. In total, medical data for 161,638 adult patients were collected every two years from 2015 to 2021. After excluding the population with incomplete data, a total of 47,608 people were included in this study. The average age of the included population was 41.3 years, and the average body mass index (BMI) and waist circumference were normal. The prevalence rate of diabetes was only 5.0 ± 0.6% during the past six years in this region, which was significantly lower than the 12.8% newly reported by the IDF in China. Interestingly, 5.3 ± 2.0% of the patients had prediabetes, which enabled us to explore the risk factors for diabetes and the characteristics of its progression under preclinical and disease conditions. The main demographic characteristics of the included population are detailed in Table 1.
BMI, body mass index; SBP, systolic blood pressure; DBP, diastolic blood pressure; HR, heart rate; bpm, beats per minute; HB, hemoglobin; PLT, platelet; Alb, albumin; ALT, alanine aminotransferase; AST, aspartate aminotransferase; Crea, creatinine; UA, uric acid; TC, total cholesterol; TG, triglyceride; HDL, high-density lipoprotein; LDL, low-density lipoprotein; FBG, fasting blood glucose; U-Pro, urine protein; U-Glu, urine glucose.
Table 2 shows that the prevalence of both prediabetes and diabetes increased from 2015 to 2019. During the process, we found that the new sufferers tended to be younger with a lower BMI. Since 2020, the hospital has carried out active health guidance, such as exercise, diet and other health interventions, for citizens in the region. The results of the physical examination in 2021 showed that the incidence of prediabetes and diabetes decreased, and the incidence of the former decreased much faster than that of the latter (Table 2).
Distribution of fasting plasma glucose levels in the local population
The fasting blood glucose distribution in the included population was observed, and the blood glucose levels in the healthy group were normally distributed, while the blood glucose levels in the prediabetic and diabetic groups were typically left skewed (Fig. 1). The population data exactly reflected body regulation; as long as the plasma glucose level was out of control and exceeded the upper threshold, the disease progressed step by step. Because this is a gradually aggravated pathogenic process, early intervention is necessary for disease prevention.
Because the cutoff point for the diagnosis of impaired fasting glucose remains controversial22, a large number of young patients might be helpful for evaluating and distinguishing blood sugar levels among healthy individuals and individuals with prediabetes and diabetes. First, we found that the fasting plasma glucose level presented a slightly left skewed distribution in the whole population (the blue curve in Fig. 2), where the left distribution was stable because hypoglycemia is rare without nutrient deficiency in such a wealthy region. Given this, we made a normalized transformation by mirror symmetry based on the left part of the blue curve and then obtained a normal distribution (the green area in Fig. 2). Interestingly, in this distribution, when the confidence interval was set to 99.7%, the lower boundary was 3.9 mmol/L, exactly the lower boundary of the healthy criteria, and the corresponding upper boundary was 6.3 mmol/L, slightly higher than the upper boundary of the current healthy criteria. If the lower limit was set to 3.0 mmol/L, which is equal to the cutoff point for level 2 hypoglycemia17,18, the corresponding upper limit was 7.2 mmol/L, which is slightly greater than the current diagnostic cutoff point for diabetes.
Normalized distribution of fasting plasma glucose levels in the healthy group. The normalized distribution shows in green area, which was calculated by mirror symmetric transformation according to the area to the left of peak in the original plot (the blue wave line). In the normalized distribution, the pair of boundary values (3.0 mmol/L and 7.2 mmol/L) were observed at the 100% confidence interval, the other pair (3.9 mmol/L and 6.3 mmol/L) obtained at 99.7% confidence interval.
Diabetes risk factor analysis and feature extraction
To assess the risk factors for diabetes, correlation analysis was performed with 64 principal items, including physical examination, routine blood examination, routine urine examination and biochemical tests. Although the plasma glucose value appeared to be a continuous variable, we did not observe a monotonic relationship with any of the items by Spearman’s correlation analysis. On the other hand, a wide range of glucose concentrations maintains a healthy status, unless it is beyond the upper threshold. Therefore, we performed a comparative analysis of the distributions of continuous variables between healthy controls and patients with disease status. The variables were harvested with a VD of distance between two statuses greater than 10, and both p values of the T test and Wilcoxon test were less than 0.001. Finally, out of the 64 items, we detected 8 characteristic factors associated with prediabetes risk (Fig. 3A) and 12 characteristic factors associated with diabetes risk (Fig. 3B).
It should be noted that the blood and urine sugar indicators were excluded from the risk factors because they have already been used as classification indicators of health status.
Distribution of raw exam values at different statuses. (A) Comparison between individuals with a normal status and those with prediabetes; (B) Comparison between individuals with a normal status and those with diabetes. All of the p values were less than 0.001 according to the t test and Wilcoxon test.
Differences in the prevalence and risk factors between males and females
The chi-square (χ2) test revealed that diabetes was much more common in men than in women. Table 3 shows that males were at a greater risk in this area; in contrast, women had a much lower risk than expected (p < 0.001).
To further investigate the reason for this difference, we performed a comparative analysis of the factors between men and women. At first, age was regarded as the main risk factor for diabetes, but we found that the ages of men and women were quite close to one another; men were 41.37 ± 12.51 years old, and women were 41.16 ± 12.71 years old. The variant degree (VD) of age was only 0.5. The main factors with a VD greater than 10 were significantly different between men and women (p < 0.001) (Table 4). The factors associated with physiological differences, such as RBC, Hb and UA, were easily ruled out, and the remaining factors related to diabetes, including BMI, weight, ALP and urea, which are risk factors for diabetes but not age, were revealed.
Estimation of the risk model based on the polynomial regression algorithm
To further investigate the intrinsic relationships and synergistic effects of the above factors during the pathogenesis and progression of diabetes, we attempted to establish a machine learning model to estimate disease risk.
First, to avoid overlapping signals in the model, we cleaned the multicollinearity within these individual risk factors. Figure 4 shows the highly correlated pairs, including SBP and DBP, BMI and body weight. Thus, each of them remained as the model variable. Finally, 10 factors were selected.
Then, we estimated the risk of the selected variables by using triple-order polynomial regression analysis (Fig. 5). All of them had good curve fitness with a low standard error, except for urea and CONDCT, but when we combined these two variables, the fitness improved. Linear regression revealed that some variables, including age, BMI and HDL-c, which are associated with regressive changes in the body, obesity and metabolic disorders, were closely correlated with diabetes incidence. In contrast, the others were only partly related, for instance, DBP, urea and CONDCT, which might also be associated with other primary diseases in addition to diabetes.
Therefore, we established a polynomial regression model (PRM) and carried out a diabetes risk assessment. After the model was trained, the precision and recall rate of the prediction reached 0.76 and 0.86, respectively, with an F1 score of 0. 81. This model achieved almost identical performance to support vector machine (SVM), a well-known supervised machine learning algorithm, where the precision and recall rate were 0.76 and 0.88, respectively, with an F1 score of 0.82. Moreover, both models obtained a high area under the curve (AUC) of 0.88 according to receiver operating characteristic (ROC) curves (Fig. 6a). To avoiding the influence of data imbalance, we observed precision-recall curve as another classification assessment, the results suggested that the performances of two models were similar as well (Fig. 6b). These findings suggested that both models presented satisfactory performance for diabetes risk prediction. In contrast, our model is much better at understanding the biological features of each factor, which is helpful for the risk assessment of diabetes.
Discussion
In this study, we investigated the incidence rates of diabetes and prediabetes in a developing modern city in Southwest China and analyzed the possible risk factors. The prevalence rate of diabetes was 5.0 ± 0.6% during the past six years in this region, which was significantly lower than the 12.8% newly reported by the IDF in China. Interestingly, 5.3 ± 2.0% of the patients had prediabetes, and the prevalence rate was similar to that of diabetes. The results showed that the population in the region has a low incidence of diabetes but a relatively higher rate of prediabetes. After that, we established a risk prediction model for diabetes and prediabetes through polynomial regression and machine learning, which is helpful for identifying high-risk people with diabetes for early diagnosis and intervention to improve the prognosis of patients.
Early diagnosis and intervention of diabetes and early diabetes are effective measures for delaying the disease process and improving patient prognosis. The establishment of a risk prediction model is helpful for early screening of high-risk groups for diabetes. Therefore, we explored the risk factors for diabetes and the characteristics of its progression under preclinical and disease conditions. From the different disease states, we identified early risk factors for diabetes pathogenesis associated with aging, metabolic disorders and obesity23,24,25, and consequent risk factors, such as P_LCR, PLT, LYM#, urea and CONDCT, were more likely to be associated with complications involving decreased immunity and cardiovascular and kidney abnormalities. These factors reflect functional and pathological changes during these processes26,27,28,29. The discovery of this rule is very useful for developing more rational prevention and therapeutic strategies at different stages.
Age is one of the risk factors for diabetes. Interestingly, in this study, the average age in this area was 2.5 years greater than that in the whole country of China but was accompanied by a much lower prevalence. This means that age is an independent risk factor but not a unique risk factor. Although we cannot stop this time, we can regulate other factors to decrease the risk of disease. Early intervention has been carried out in the health management center of our hospital since 2020, involving health inquiries, periodic follow-up, special examinations, and diet and sports guidance. It has played a helpful role in curbing the prevalence at the nearest time point.
This study revealed a greater prevalence of diabetes in men than in women. Unlike other reports30, age does not play a major role in this region because the ages of men and women are almost the same. Combined with the risk assessment, here, we distinguish the main diabetes risk factors causing the different prevalence rates of disease from other factors because of their physiological differences, such as Hb and UA31,32. The comparative results focused on BMI, ALP and urea, which are closely related to diet and lipid metabolism33. Overall, although they are living in the same region, different urban cultures between sexes might yield different results. Men are more likely to consume delicious food, while most urban women prefer to maintain their figure, resulting in differences in weight or BMI between men and women. Insulin resistance and chronic inflammation are key pathways in the pathogenesis of Type 2 Diabetes Mellitus leading to Nonalcoholic Fatty Liver Disease, a non-alcoholic fatty liver disease, which is associated with insulin resistance and diabetes risk, and can be assessed by measuring liver enzymes such as ALP34. According to studies, the global prevalence of adult Nonalcoholic Fatty Liver Disease is 32%, with males (40%) being significantly more likely than females (26%)35, and reasons such as male preference for obesity and casual lifestyles may contribute to this outcome. In addition, urea is the end product of protein metabolism in the body, and its serum level is an important indicator reflecting the glomerular filtration function36, and relevant studies have shown that it has a good predictive value for diabetic nephropathy37. Men are more likely to develop hypertension, metabolic diseases, and poor lifestyle. Therefore, the fact reminds us to maintain a good lifestyle and drop out of a bad lifestyle. For instance, suppressing obesity and maintaining a normal BMI are necessary for preventing the onset and suppressing the progression of diabetes.
This study adopted bioinformatics and machine learning techniques for big data analysis, accompanied by classical statistics. Both the polynomial regression model and SVM model are reliable and achieved satisfactory performance for disease prediction and health classification. The polynomial regression algorithm allows us to clearly understand the degree of risk of each variable and the synergistic effect on the occurrence of diabetes. Blood and urine sugar were not used as risk factors or variables in the model.
Despite its low prevalence, the big data in this region display a characteristic distribution of fasting plasma glucose in the healthy population, where the right boundary value is 7.2 mmol/L, slightly higher than the current cutoff criterion of 7.0 mmol/L and the old standard of 6.8 mmol/L in the early years. Therefore, this distribution appears to be tolerant of the local population to a certain degree after a nutrient boost, but this needs to be confirmed in future studies.
Overall, this study revealed the roles of different risk factors and illustrated the mechanism underlying the pathogenesis and progression of type 2 diabetes, including lipid metabolism disorders, obesity and fat body shape, followed by impaired glucose tolerance, cardiovascular and kidney dysfunction, and even immune suppression. A machine learning model trained with big data from health examinations can aid in assessing diabetes risk and predicting its onset. It is helpful to provide more individual and precise strategies to prevent this chronic disease in the early stage.
Data availability
The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.After discussion among the authors of this study, if the article is successfully published, raw data can be uploaded if necessary.
References
Sun, H. et al. IDF Diabetes Atlas: Global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabetes Res. Clin. Pract. 183, 109119. https://doi.org/10.1016/j.diabres.2021.109119 (2022).
International Diabetes Federation. IDF Diabetes Atlas, 10th edn. Brussels, Belgium. (2021). Available at: https://www.diabetesatlas.org.2021
Henning, R. J. Type-2 diabetes mellitus and cardiovascular disease. Future Cardiol. 14 (6), 491–509 (2018).
Artasensi, A. et al. Type 2 diabetes Mellitus: a review of Multi-target drugs. Molecules 25(8), 1987 (2020).
Wang, T. et al. Age-related disparities in diabetes risk attributable to modifiable risk factor profiles in Chinese adults: a nationwide, population-based, cohort study. Lancet Healthy Longev. 2 (10), e618–e628 (2021).
Jankun, J., Al-Senaidy, A. & Skrzypczak-Jankun, E. Can inactivators of plasminogen activator inhibitor alleviate the burden of obesity and diabetes? (review). Int. J. Mol. Med. 29 (1), 3–11 (2012).
Yuan, X. et al. Effect of the ketogenic diet on glycemic control, insulin resistance, and lipid metabolism in patients with T2DM: a systematic review and meta-analysis. Nutr. Diabetes. 10 (1), 38 (2020).
Damon, S. et al. Nutrition and Diabetes Mellitus: an overview of the current evidence. Wien Med. Wochenschr. 161 (11–12), 282–288 (2011).
Dhatariya, K. K. et al. Diabetic ketoacidosis. Nat. Rev. Dis. Primers. 6 (1), 40 (2020).
van Sloten, T. T. et al. Cerebral microvascular complications of type 2 diabetes: stroke, cognitive dysfunction, and depression. Lancet Diabetes Endocrinol. 8 (4), 325–336 (2020).
Doshi, S. M. & Friedman, A. N. Diagnosis and management of type 2 Diabetic kidney disease. Clin. J. Am. Soc. Nephrol. 12 (8), 1366–1373 (2017).
Eid, S. et al. New insights into the mechanisms of diabetic complications: role of lipids and lipid metabolism. Diabetologia 62 (9), 1539–1549 (2019).
Wheeler, D. C. et al. Effects of dapagliflozin on major adverse kidney and cardiovascular events in patients with diabetic and non-diabetic chronic kidney disease: a prespecified analysis from the DAPA-CKD trial. Lancet Diabetes Endocrinol. 9 (1), 22–31 (2021).
Pradeepa, R. & Mohan, V. Prevalence of type 2 diabetes and its complications in India and economic costs to the nation. Eur. J. Clin. Nutr. 71 (7), 816–824 (2017).
Schellenberg, E. S. et al. Lifestyle interventions for patients with and at risk for type 2 diabetes: a systematic review and meta-analysis. Ann. Intern. Med. 159 (8), 543–551 (2013).
Beulens, J. et al. Risk and management of pre-diabetes. Eur. J. Prev. Cardiol. 26 (2_suppl), 47–54 (2019).
World Health Organization & International Diabetes Federation. Definition and diagnosis of diabetes mellitus and intermediate hyperglycemia: report of a WHO/IDF consultation. World Health Organization (2006).
Study, I. H. Glucose concentrations of Less Than 3.0 mmol/L (54 mg/dL) should be reported in clinical trials: a joint position Statement of the American Diabetes Association and the European Association for the study of diabetes. Diabetes Care. 40 (1), 155–157 (2017).
American Diabetes, A. 6. Glycemic targets: standards of Medical Care in Diabetes-2018. Diabetes Care. 41 (Suppl 1), S55–S64 (2018).
Hunter, J. D. Matplotlib: a 2D Graphics Environment. Comput. Sci. Eng. 9 (3), 90–95 (2007).
Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825–2830, 2011.
Huang, Y. et al. Association between prediabetes and risk of cardiovascular disease and all cause mortality: systematic review and meta-analysis. BMJ 355, i5953 (2016).
Li, Y. et al. Time trends of Dietary and Lifestyle factors and their potential impact on diabetes Burden in China. Diabetes Care. 40 (12), 1685–1694 (2017).
Bloomgarden, Z. & Ning, G. Diabetes Aging J. Diabetes, 5(4): 369–371. (2013).
Estampador, A. C. & Franks, P. W. Precision Medicine in Obesity and type 2 diabetes: the relevance of early-life exposures. Clin. Chem. 64 (1), 130–141 (2018).
Einarson, T. R. et al. Prevalence of cardiovascular disease in type 2 diabetes: a systematic literature review of scientific evidence from across the world in 2007–2017. Cardiovasc. Diabetol. 17 (1), 83 (2018).
Archundia Herrera, M. C., Subhan, F. B. & Chan, C. B. Dietary patterns and Cardiovascular Disease risk in people with type 2 diabetes. Curr. Obes. Rep. 6 (4), 405–413 (2017).
Koye, D. N. et al. The global epidemiology of diabetes and kidney disease. Adv. Chronic Kidney Dis. 25 (2), 121–132 (2018).
Vijan, S. Type 2 diabetes. Ann. Intern. Med. 171 (9), ITC65–ITC80 (2019).
Wild, S. et al. Global prevalence of diabetes: estimates for the year 2000 and projections for 2030. Diabetes Care. 27 (5), 1047–1053 (2004).
Tzounakas, V. L. et al. Sex-related aspects of the red blood cell storage lesion. Blood Transfus. 19 (3), 224–236 (2021).
Jung, J. H. et al. Serum uric acid levels and hormone therapy type: a retrospective cohort study of postmenopausal women. Menopause 25 (1), 77–81 (2018).
Mishra, J., Srivastava, S. K. & Pandey, K. B. Compromised renal and hepatic functions and Unsteady Cellular Redox State during Preeclampsia and Gestational Diabetes Mellitus. Arch. Med. Res. 52 (6), 635–640 (2021).
Noroozi, K. M. et al. Serum liver enzymes and diabetes from the Rafsanjan cohort study [J]. BMC Endocr. Disord. 22 (1), 127 (2022).
Teng, M. L. et al. Global incidence and prevalence of nonalcoholic fatty liver disease [J]. Clin. Mol. Hepatol. 29 (Suppl), S32–s42 (2023).
Cao, Y. F. et al. Plasma levels of amino acids related to Urea cycle and risk of type 2 diabetes Mellitus in Chinese adults [J]. Front. Endocrinol. (Lausanne). 10, 50 (2019).
Papadopoulou-Marketou, N. et al. Biomarkers of diabetic nephropathy: a 2017 update [J]. Crit. Rev. Clin. Lab. Sci. 54 (5), 326–342 (2017).
Author information
Authors and Affiliations
Contributions
L.X. conceived and initially designed this study; L.X. and YY.W. collected the data. Y.L. and C.Z. designed and supervised the statistical analysis. XC.S. and W.N. participated in the data cleaning, statistical analysis and modeling. Y.L. and C.Z. wrote the manuscript. All authors reviewed the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Conflict of interest
There are no potential conflicts of interest relevant to this article.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Xu, L., Sun, X., Wang, N. et al. Risk factor assessment of prediabetes and diabetes based on epidemic characteristics in new urban areas: a retrospective and a machine learning study. Sci Rep 15, 3792 (2025). https://doi.org/10.1038/s41598-025-88073-6
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-88073-6








