Introduction

According to the International Diabetes Federation (IDF), prediabetes (PreD) affected 7.5% of the global population in 2019, corresponding to approximately 374 million adults aged 18–99. This figure is expected to rise to 8.6% by 2045 if no prompt actions are taken1. PreD is a condition in which a person’s blood sugar levels are higher than usual but not high enough to be categorized as type 2 diabetes (T2D). Based on the American Diabetes Association criteria, PredD is defined as having an HbA1c level between 5.7% and 6.4% (39 and 47 mmol/mol), a fasting glucose concentration between 100 and 125 mg/dl (5.6 and 6.9 mmol/L), or a 2 h oral glucose tolerance test between 140 and 200 mg/dl (7.8–11.0 mmol/L). People with PredD are at an increased risk of developing T2D and other health complications, such as heart disease and stroke2. Indeed, around 5-10% of individuals with PreD progress to T2D annually2,3. Furthermore, people with PreD have a 33-65% 6-year probability of developing T2D, compared to 5% for those with normoglycemia4. Fortunately, recent research has shown that the progression from PreD to T2D can be prevented, or at least delayed, in a large fraction of individuals with PreD in response to intensive lifestyle intervention5,6,7,8. These data suggest, therefore, that early detection and treatment of PreD is a highly cost-effective and fundamental strategy in T2D prevention. It also underscores the importance of understanding the pathophysiology and the risk factors associated with the onset of PreD.

PreD and T2D are usually associated with overweight and obesity9. Thus, current recommendations for PreD and T2D screening by the American Diabetes Association focus nearly exclusively on adults who are overweight or obese as defined by body mass index (BMI; kg/m2) until the patient meets the age-oriented screening at 45 years10,11. This focus on obese or overweight individuals, however, may lead to missed opportunities for investigation of undetected PreD and T2D in normal-weight (NW) individuals11.

A normal-weight (NW) person generally has a BMI between 18.5 and 24.9 Kg/m2. Although most NW adults appear healthy, a considerable percentage may be afflicted by undiagnosed metabolic conditions such as insulin resistance, PreD, T2D, or nonalcoholic fatty liver disease (NAFLD)12. Because these individuals usually have a high body fat mass but a normal BMI, they are known as Normal-Weight Obese (NWO)13. The precise cause of NWO is unknown, but genetics, food, and physical activity have all been linked to the disorder. There is evidence that when compared to normal-weight healthy (NWH) participants with a normal BMI and body fat percentage, NWO subjects exhibit changes in body composition, inflammation, and oxidative stress13. Interestingly, a recent study from the USA showed that the prevalence of both PreD and unhealthy waist circumference (abdominal obesity) among diabetes-free adults aged 20 years and older and within a healthy BMI range, significantly increased between 1988 and 201211. However, abdominal obesity does not appear to be the primary cause of the observed increase in PreD rate11. Given the above-mentioned annual conversion rate from PreD to T2D, T2D preventive efforts will benefit from future research aimed at establishing the root cause of this rise to help detect PreD in primary care among NW individuals.

Over the last three decades, the Middle East region has witnessed a significant increase in obesity and T2D rates, owing primarily to the adoption of a Western lifestyle characterized by sedentary behavior and the consumption of calorie-dense foods and beverages14. PreD is also highly prevalent in the region, with reported rates ranging between 20 and 40%15,16,17. These alarming data about PreD raise great concern in the region given annual conversion rates, which indicate that the T2D epidemic sweeping the region is set to worsen if nothing is done to prevent the progression from PreD to T2D. Several previous studies have investigated the risk factors of PreD and T2D in overweight or obese people. However, the risk factors associated with PreD in NW subjects have rarely been investigated.

In the present study, we used clinical, demographic, and anthropometric data of NW (BMI of 18.5 to 24.99 Kg/m2) or obese (BMI ≥ 30 Kg/m2) adults (aged 18 years and older) who are normoglycemic or have PreD, and applied different machine learning techniques to identify the most significant risk factors associated with PreD in NW subjects.

Methods

Study Population

We obtained cross-sectional clinical, anthropometric, and demographic data of 5996 Qatari individuals aged between 18 and 86 years (3,229 Females and 2,771 Males) from Qatar Biobank (QBB), a national institute running a well-phenotyped cohort, by collecting data from the general population in Qatar since 201218. Inclusion criteria included being over 18 years and having a HbA1c < 6.5%. People with type 2 diabetes (HbA1c≥ 6.5%) and pregnant women were excluded. The Flowchart of data processing is shown in Fig. 1.

The institutional review board approved the current project at the Qatar Biomedical Research Institute (IRB number: 2017–001) and QBB (IRB number: Ex-2018‐Res‐ACC‐0123‐0067). All participants gave written informed consent for their data and biospecimens to be used in medical research.

Fig. 1
figure 1

Flowchart of data processing.

Anthropometric and clinical measures

The Qatar Biobank (QBB) provides 52 clinical measurements along with 9 additional measurements used to assess various aspects of health and physiology through medical tests and imaging, including Grip strength, 12-lead ECG, ultrasound scan of carotid arteries, Vicorder artery stiffness, retinal eye test, DXA scan of the whole body, treadmill walking test, lung function, and MRI for eligible participants. For the purpose of this study we only utilized the 52 blood test measurements along with height, weight, body fat measurement, blood pressure, and hip and waist measurements. Consequently, data on 57 variables were requested. Due to space constraints, a comprehensive list of these variables is not provided here. For detailed descriptions of the variables, please refer to the QBB website.

https://www.qatarbiobank.org.qa/participate/description-measurements. The BMI (kg/m2) was calculated as weight in kilograms (kg) divided by measured height in meters squared (m2).

For variable categorization, well-accepted clinical guidelines were used, when available. For BMI (in kg/m2), the Caucasian cut‐offs were used, categorizing BMI into four groups: underweight (BMI < 18.5 kg/m2), normal (BMI 18.5–24.9 kg/m2), overweight (BMI 25–29.9 kg/m2) and obese (BMI ≥ 30 kg/m2).

Specifically, we will use here two groups: normal-weight (BMI 18.5–24.9 kg/m2) and overweight/obese (BMI ≥ 25 kg/m2); NW and OWO respectively.

Plasma samples of patients fasting for at least 6 h were handled according to a standard protocol within 2 h of blood collection. Fasting plasma glucose (FPG), HbA1c, triglyceride (TG), total cholesterol (TC), low-density lipid cholesterol (LDL-C), and high-density lipid cholesterol (HDL-C) were analyzed with an automated biochemical analyzer at the central laboratories at the Hamad Medical Corporation in Doha.

PreD cases were defined as those individuals with HbA1c between 39 mmol/mol (5.7%) and 47 mmol/mol (6.4%), whereas controls were those with HbA1c < 39 mmol/mol (5.7%).

Two more variables were calculated, the Homeostasis model assessment of insulin resistance (HOMA-IR) and homeostasis model assessment of β-cell dysfunction (HOMA-B). HOMA-IR was calculated as = fasting insulin (µIU/L) \(\:\times\:\) fasting glucose (nmol/L)/22 or (I0 (µIU/mL) \(\:\times\:\:\)G0 (mmol/L)/22) and HOMA-B was calculated as = ((20 \(\:\times\:\:\)insulin)/(glucose − 3.5))/100 or ((20\(\:\:\times\:\:\)I0 (μIU/mL)/G0 (mmol/L) − 3.5))/10019.

Training and validation populations

A 65/35 split was used on the 1,160 samples. For the training of the machine learning (ML) models, we used a case–control design that included 109 cases and 645 healthy controls. To validate the models developed in the training stage, we used data from 59 cases and 347 healthy controls (see Fig. 1).

Statistical analysis

All statistical analysis was carried out using R version 3.32.1.1, and R package “h2o” (version 3.17.0.4195) for building logistic regression and the other machine learning (ML) models. Variables with > 20% missing values were excluded. The unsaturated iron-binding capacity (UIBC) variable, although missing 21%, was kept for its importance. All the remaining variables had < 20% missing values, were imputed using the MICE package in R.

Descriptive statistics were used to describe the baseline characteristics of participants. Continuous variables were expressed as means ± standard deviation (SD). Independent Student’s t-test was used to compare the means, where the \(\:\chi\:\)2-test was used to compare proportions and the dependence between the prevalence of PredD and the different factors. Statistical significance for all tests was set at p < 0.05 (Tables 1 and 2).

Machine learning models

In this section, we employ a variety of machine learning algorithms including deep learning (DL), gradient boosting machine (GBM), random forest (RF), and generalized linear models (GLM). As a baseline, we also use a logistic regression model (LR) due to its simplicity and ease of implementation, making it accessible for researchers with limited machine learning experience and facilitating the creation of their own intent prediction systems. Additionally, other machine learning models excel at capturing complex, non-linear relationships in data, making them highly effective for nuanced pattern recognition.

The package “h2o” (version 3.32.1.1)20 was used for building the machine learning (ML) models.

Random Forest

Random forest (RF) belongs to the class of ensemble based supervised learning techniques. Random forest algorithm applies the general technique of bagging or bootstrapped aggregating to decision tree learners. By performing this bootstrapping procedure, we obtain better model performance as it decreases the variance of the model, without increasing bias. This means that though each tree is a weak learner and sensitive to noise within its respective data, the average/majority of many trees is not, as long as the trees are not correlated. Thus, this bootstrap sampling is used to de-correlate the trees by showing them different parts of the dataset. Random forests automatically rank the importance of variables in a classification problem by considering the average Information Gain corresponding to each variable for all the trees. We used R package caret to generate random forest models21.

Gradient boosting machine

We used gradient boosting machine (GBM) another ensemble technique for building a Predictive model. The principle idea behind this algorithm is to construct the new baselearners to be maximally correlated with the negative gradient of the loss function, associated with the whole ensemble. We used R package caret for building a GBM predictive model21.

Deep learning

Deep learning (DL) is a more complex and less interpretable machine learning technique.

Deep learning is vaguely inspired by information processing and communication patterns in biological nervous systems. Of late, Deep Learning based models have been successfully applied in computer vision natural language processing, bioinformatics etc. The problem of PreD identification is a classification problem. In the case of deep learning, we learn a non-linear mapping function that takes as input the feature set, xi, for a given sample and outputs a score [0, 1] i.e. t : xi → yi, where t is the mapping function. In this work, t is a Deep Fully Connected Feed-Forward Neural Network (DNN) that exploits the non-linear interactions between the input features to make its prediction. A feed-forward neural network with a single hidden layer containing a finite number of neurons can approximate any continuous functions under certain mild assumptions on the activation function.

Performance measure

Results are presented as Odds Ratios (OR) with associated 95% confidence intervals (CI) for 1-SD increase of the independent variables. The predictive value for preD of each index was determined by the area under the curve (AUC) in the Receiver Operating Characteristic curve (ROC) analyses. The cut-off point was selected according to the Youden index (sensitivity + specificity − 1). Statistical significance was set at p < 0.05.

To compare the performance of machine learning models and logistic regression, we focused exclusively on the ROC curve and AUC.

Sensitivity (true positive rate)

Sensitivity is the proportion of actual positive cases that are correctly identified by the classifier. It is calculated as.

$$\:sensitivity=\frac{TP}{TP+FN}$$

Specificity (True Negative Rate): Specificity is the proportion of actual negative cases that are correctly identified by the classifier. It is calculated as:

$$\:specificity=\frac{TN}{TN+FP}$$

Where TN is the true positive, FN is the false negative, FP is the false positive.

ROC curve

To plot an ROC curve, we calculate the true positive rate (sensitivity) and the false positive rate (1-specificity) at various threshold settings. Then, we plot sensitivity on the y-axis against 1-specificity on the x-axis for each threshold setting. This gives us a curve that shows how sensitivity and specificity change with different threshold values.

AUC is then calculated by measuring the area under the ROC curve. A perfect classifier has an AUC-ROC close to 1, while a completely random classifier.

These metrics are particularly suitable for risk score prediction as they provide a comprehensive evaluation of the model’s ability to discriminate between different risk levels, independent of any specific threshold. The ROC curve illustrates the trade-off between sensitivity and specificity, while the AUC quantifies the overall performance across all possible thresholds, ensuring a robust assessment of the model’s predictive capabilities in identifying prediabetes risk.

Results

Demographic and clinical characteristics of participants

The basal characteristics of the participants are presented in Table 1. The percentage of men was 46.21%. The prevalence of PreD was 33% (out of the 5996 individuals, 1996 had PreD (HbA1c between 5.7% and 6.5%). The observed distributions of key variables such as age, gender, and other clinical measurements are consistent with population norms, thereby reinforcing the representativeness of our sample.

Table 1 Baseline characteristics of participants (n = 5996).

Table 2 displays the baseline characteristics of the NW participants in the training and validation datasets. The percentage of men in the training population was 49.20%, whereas in the validation population, men represented 51.48%. The prevalence of PreD was 14% in the two sets. HbA1c levels are significantly different between cases and controls in the two sets (p < 0.001). In the two sets, the individuals with Prediabetes are significantly older than the healthy controls (p < 0.001). Further, The triglyceride levels, the BMI, and insulin resistance, measured with the HOMA-IR, are all significantly higher in the Prediabetes individuals (p < 0.001).

Table 2 Baseline characteristics of normal weight participants in training and validation datasets .

PreD risk factors for NW group versus others

When we fitted a logistic regression model on the NW and OWO groups, the independent variables, risk factors, that are selected by the model, and their corresponding estimates, are shown in Tables 3 and 4 respectively.

Table 3 Forward and backward stepwise logistic regression for PreD in NW group. Table summarizes significant variables.
Table 4 Logistic regression for PreD in OWO group.

Comparison between NW vs. OWO groups using logistic regression model

To identify the risk factors that are only associated with NW, we compared the set of variables selected by the NW against OWO using stepwise logistic regression models. Figure 2 shows a Venn diagram of the number of intersecting variables between the two models highlighting the risk factors in NW in green (9 unique variables and 12 overlapping variables), OWO in pink (10 unique variables and 12 overlapping variables). Table 5 lists the variables of each model, highlighting the risk factors unique to NW in green (9 variables), and OWO in pink (10 variables).

Fig. 2
figure 2

Stepwise logistic regression model: NW versus OWO.

Table 5 Stepwise logistic regression model: NW versus OWO.

Triglyceride-based model

After eliminating the 12 overlapping variables, and using only the remaining 9 variables in the NW group, we fit a forward and backward stepwise logistic regression model on the NW group. The independent variables, risk factors, that are selected by the model, and their corresponding estimates, are shown in Table 6.

Table 6 Forward and backward stepwise logistic regression for PreD in NW subjects after eliminating the overlapping risk factors.

We noticed that, of the remaining four risk factors, triglyceride is a very relevant risk factor with an odds ratio of 2.79 and a significant p-value. So, we build a model with triglyceride as the main risk factor, and adjusted for age and gender. To measure the performance of the model, we used the Receiver Operating Characteristics (ROC) curve. Figure 3 shows the ROC curve of the triglyceride-based model with an AUC equal to 86.27%.

Fig. 3
figure 3

ROC curve of the triglyceride-based model with an AUC equal to 86.27%.

Furthermore, we adjusted the model by adding the other 3 risk factors which are Folate, Pulse, and TIBC in order of their odds ratio, to investigate their effect on the AUC performance. Figure 4 summarizes the ROC curve of the different obtained 8 nested models.

Fig. 4
figure 4

ROC curve of the different 8 nested models.

We noticed that the AUC performance slightly improved when we included the TIBC and Pulse risk factors but decreased when Folate was added to the model.

Comparison between NW vs. OWO models using ML models

We followed the same steps done with LR using other ML approaches. First, we used 4 ML algorithms to rank the risk factors. Table 7 shows the top 10 risk factors ranked by each ML algorithm for OWO individuals. Most importantly, Table 8 shows the top 10 risk factors ranked by each ML algorithm for NW. We noticed that in the 10 top ranks of the ML algorithms, Age was ranked first for most of the models. In GBM1, GLM1, DL1 and RF1, triglyceride is ranked 2nd, 6th, 3rd and 3rd respectively.

Table 7 The top 10 risk factors ranked by each ML algorithm for the OWO individuals.
Table 8 The top 10 risk factors ranked by each ML algorithm for the NW individuals.

To identify the risk factors that are only associated with NW, we did a comparison between the variables selected by the NW (Table 8) against OWO (Table 7) machine learning models. Table 9 shows the remaining risk factors ranked by each ML algorithm for the NW individuals after eliminating the overlapping ones. We noticed that triglyceride was a common factor that has been picked by the 4 ML models. We think that DL1 did not rank triglyceride on the top probably due to the fact that DL1 gives good results for large datasets which is not the case here. We also noticed that Phosphorus was sometimes picked by some but not all ML models but this is not the aim of this study.

Table 9 The top risk factors ranked by each ML algorithm for NW individuals after eliminating the overlapping variables present in the OWO group.

Test of equality of ROC areas between machine learning and logistic regression

After running the four machine learning models and logistic regression model using only triglyceride (adjustied for age and gender), we summarized the performance of the 4 models in Fig. 5. Next, we were interested in testing whether the area under the ROC for logistic regression is significantly equal to that for DL1, RF1, GBM1, and GLM1. To perform this task, we used roccomp function which provides comparison of the ROC curves of multiple classifiers21,22.

For each curve, roccomp reports summary statistics and provides a test for the equality of the area under the curves, using an algorithm suggested by DeLong et al. (1988)21.

Table 10 summarizes the p-value for each \(\:\chi\:\)2-test statistics obtained after applying “roccomp”22. We can see that the ROC of logistic regression is significantly different than that of Random Forest and Gradient Boosting Machine models.

Fig. 5
figure 5

ROC curve of the different 4 machine learning models using only triglyceride.

Table 10 Test of equality of ROC areas between logistic regression and all the other machine learning algorithms1.

Conclusion & Discussion

To the best of our knowledge, this study is the first to comprehensively investigate the risk factors associated with PreD in a cohort of NW adult individuals in Qatar using different multivariable machine learning (ML) techniques. Our approach allows for a nuanced understanding of the contributory factors, providing a foundation for targeted preventive measures. The different ML models we developed indicate a robust positive correlation between high triglyceride levels and the odds of having PreD in NW Qatari adults.

T2D is a significant global public health issue, with incidence and mortality rates consistently increasing in most countries23. Prediabetes, recognized as a significant independent risk factor for T2D, is central to this public health challenge. The 5–10% annual conversion rate from PreD to T2D3; signals a warning that, without intervention, the T2D epidemic is likely to worsen in the future.

Obesity is a well-established risk factor for both T2D and prediabetes. Hence, current guidelines from national and international health organizations regarding the screening for PreD and T2D are generally limited to individuals who are overweight or obese10. This focus, however, may result in missed opportunities for the early detection of undiagnosed disease in individuals with a healthy weight, given that several studies have shown that prediabetes/T2D can strike hard even when weight is in the normal range (BMI between 18.5 and 24.9 kg/m2)11,24,25,26,27.

Significant efforts have been dedicated to identifying and understanding the risk factors associated with PreD in obese individuals. Nevertheless, the factors linked to PreD in NW individuals remain a topic of ongoing debate. Despite the lower prevalence of PreD in NW individuals compared to their overweight and obese counterparts, identifying the specific metabolic and physiological drivers in this population is crucial. This knowledge is essential for developing effective prevention strategies and personalized clinical management approaches tailored to the unique needs of NW individuals. Understanding these factors is vital for the early detection of prediabetes, which, combined with timely and effective intervention strategies, is crucial in preventing its progression to T2D and potentially reducing the overall incidence and associated mortality rates of T2D.

The different ML models we developed indicate a robust positive correlation between triglyceride levels and PreD in NW Qatari adult individuals. The positive correlation between elevated triglyceride levels and T2D in obese individuals is well established. A recent study involving 1341 people aged 25–44 years reported that high triglyceride levels, obesity, and a low level of education are associated with the risk of developing T2D, regardless of other factors28. It was also reported that a rise in triglyceride levels over time increases the risk of T2D in young men independently of traditional risk factors and associated changes in BMI and lifestyle parameters29. Furthermore, fasting triglycerides in the upper normal range were shown to be independently associated with an increased risk of diabetes mortality in representative USA populations30-31. Zheng and colleagues recently showed a graded positive association between elevated TG levels and inadequate glycemic control for patients with insulin-treated T2D in China32. A linear relation analysis also suggested that a triglyceride genotype score (involving 25 well-established single nucleotide polymorphisms) is linearly related to elevated T2D risk33.

Elevated blood triglyceride levels may occur because several factors, including (1) Genetic factors, which increase the predisposition of some people to higher triglyceride levels, regardless of their weight; (2) Dietary Factors, mainly the consumption of excessive amounts of simple carbohydrates (sugars and refined grains) and fats, especially saturated and trans fats; (3) Insulin Resistance, which can develop even in lean individuals; (4) Physical Inactivity, which can contribute to higher triglyceride levels, irrespective of body weight; (5) Medical conditions, including hypothyroidism, kidney disease, or liver disease; (6) Medications, such as corticosteroids, beta-blockers, and specific immunosuppressants.

We do not have data on our sample population’s genetic and dietary factors, physical activity, medical conditions, or medications. However, our data indicate that insulin resistance (measured using HOMA-IA; Table 2), a hallmark of PreD and type 2 diabetes (T2D) (PMID: 28697184), is elevated in the NW participants of this study. Therefore, while we can not exclude the role of other factors, the elevated triglyceride levels observed in NW prediabetic individuals in this study may reflect a certain degree of insulin resistance. Symptoms of metabolic syndrome, typically linked to abdominal obesity, are also relatively common among individuals with a normal BMI and waist circumference, a condition referred to as “metabolically obese normal weight” (MONW) or the TOFI phenotype (thin outside, fat inside). This phenotype suggests that despite appearing healthy, NW individuals may have a threshold of fat that renders them insulin-resistant, leading to PreD and eventually T2D34.

The high levels of triglycerides may also indicate hepatic fat accumulation. Recent epidemiological studies have indeed shown that about 20% of the total cases of non-alcoholic fatty liver disease (NAFLD) patients are lean35. Insulin resistance is the primary driver of NAFLD36, and lean NAFLD is associated with an approximately 1.6-fold increased mortality risk37.

Consequently, the findings of this study suggest that NW individuals having PreD may have insulin resistance that leads to fatty liver, which ultimately increases blood triglyceride levels. These factors combined lead to dysregulated glucose homeostasis and prediabetes.

Another factor potentially contributing to the occurrence of PreD in NW individuals is age. Our data indicate that prediabetic NW individuals are significantly older than their healthy counterparts. It is well established that the risk for metabolic diseases, including metabolic syndrome, insulin resistance, and diabetes, increases with age38-39. Furthermore, age-related alterations in plasma triglyceride metabolism and fatty acid partitioning significantly contribute to these metabolic diseases. Specifically, age-induced changes in human triglyceride metabolism include increased plasma triglyceride levels, reduced postprandial plasma triglyceride clearance rates, and elevated ectopic fat deposition, all of which can contribute to age-associated metabolic conditions40.

One of the strengths of our study is its large sample size. According to the Qatar Planning and Statistics Authority, the population of Qatar at the end of March 2024 was 3,080,804 people, with Qataris accounting for approximately 12% of the total, equating to around 369,696 individuals (https://www.psa.gov.qa/en/Pages/default.aspx; accessed on June 26, 2024). In 2015, individuals under 19 made up 47% of all Qatari nationals (https://gulfmigration.org/qatar-population-nationality-qatari-non-qatari-five-year-age-group-2015/). Assuming this percentage remained constant in 2024, approximately 195,939 Qatari adults would be eligible for our study. Thus, with a sample size of 6,000 individuals, our study has statistically significant power (6000/195939 ≈ 3%).

Additionally, the data used in our research were obtained from a well-phenotyped cohort representative of the general population. Our study is also the first to demonstrate the utility of triglyceride levels in identifying PreD in NW individuals within a Middle Eastern population. Given the shared environmental factors and lifestyle habits, as well as genetic background and ethnicity among many Middle Eastern countries, particularly the Gulf Cooperation Council nations (Qatar, Bahrain, Saudi Arabia, United Arab Emirates, Kuwait, and Oman), our findings may perform similarly in many of these countries.

The main limitation of our study is the cross-sectional design, which does allow the use of the findings to predict future prediabetes. However, the QBB has recently started to call back the participants for a 5-year follow-up, which will open new avenues for assessing the predictive ability of the different indices longitudinally. We also did not adjust for parameters such as smoking status, medication, or physical activity. Finally, the present study’s findings may not be generalizable to all populations due to the ethnic and geographic characteristics of the study population.

In conclusion, our study demonstrates a strong correlation between elevated blood triglyceride levels and PreD in NW individuals. This finding highlights the potential of triglyceride levels as a biomarker for the early detection of PreD in this population. By identifying individuals at risk through this biomarker, healthcare providers can implement timely and effective intervention strategies to prevent the progression of PreD to full-blown T2D. Given the rising prevalence of T2D globally, especially in populations not traditionally considered at risk, such as those with normal weight, these insights are critical for improving preventative healthcare measures and reducing the overall burden of diabetes. Futher investigations are warranted to better understand the mechanisms undelying the elevated TG levels in NW individuals and to longitudinally examine the causl link between elevated TGs and PreD development.