Introduction

The prevalence of type-2 diabetes (T2D) worldwide has increased by four-fold in the last four decades1,2 and it is estimated that more than half of the population with T2D live in Asia2. Increasing evidence supports the utility of food-based interventions as promising approaches to improve metabolic health and reduce the risk of T2D and cardiovascular disease (CVD)3,4,5. Whole diet interventions are favored over single food or nutrient interventions due to their practical approach and potential synergistic attributes6. For example, diet patterns such as the Mediterranean Diet and DASH (Dietary Approaches to Stop Hypertension) Diet have been frequently cited to improve glycemic control and reduce the risk of T2D6,7,8,9,10,11. These diets contain a plethora of dietary components including fruits, vegetables, nuts, legumes and, olive oil among others that can favorably affect glucose homeostasis, to reduce cardiometabolic disease risk7,12,13. As people with prediabetes are at much greater risk of developing T2D, particularly across Asia due to their greater predisposition to insulin resistance and β-cell dysfunction14, more evidence is required on the effectiveness of food-based dietary interventions in improving glycemic health in this population.

Recently, the transition from animal- to plant-based proteins and adoption of plant-based diets have taken prominence due to issues relating to environmental sustainability and health reasons. In plant-based diets, legumes are the main source of protein and are also high in dietary fiber. Consumption of legumes has been inversely associated with incidence of T2D15,16. Beyond T2D prevention, plant protein intake has been associated with reduced CVD and all-cause mortality in epidemiological studies17,18,19,20. In a randomized controlled trial (RCT), improvements in glycemic control in people with T2D through the incorporation of legumes as part of a low glycemic index (GI) diet was observed21. A more recent meta-analysis, however, did not find any significant association of legume consumption with T2D in several world regions and hence warrants further investigation22. Aside from legumes, other benefits associated with glycemic control have been attributed to consumption of low GI carbohydrates23,24, antioxidant-rich polyphenol sources such as spices25,26, and an overall high fiber diet27. A previous meta-analysis has suggested that fiber consumption decreases the risk of prediabetes by lowering glycated hemoglobin A1c (HbA1c) with a mean difference of -2.0 mmol/mol27. High-fiber whole grain diets have also been also associated with increased glucagon-like peptide-1 (GLP-1) and reduced HbA1c28,29,30.

Recent studies have increasingly highlighted the importance of gut microbiome metabolites in regulating host metabolic health. These metabolites can influence various metabolic processes in the host, including energy balance, lipid metabolism, and glucose homeostasis31. A large body of evidence has shown that fiber intake alters the structure of the gut microbiome and produces short chain fatty acids (SCFA)32,33,34,35. SCFAs, such as acetate, propionate, and butyrate, are among the main products of microbial fermentation of dietary fibers in the gut. They have been shown to play key roles in regulating host energy metabolism, anti-inflammatory responses, and maintaining gut barrier integrity36. Additionally, secondary bile acids, which are metabolic products of bile acids transformed by gut microbes, have been linked to the development of metabolic diseases such as obesity and diabetes37. A study that included 1107 subjects found that obesity and diabetes were significantly associated with reduced serum hyocholic acid (HCA) species levels. Subsequent in vitro and mouse model experiments demonstrated that HCA acts on enteroendocrine L cells to improve glucose homeostasis by activating membrane receptor TGR5 signaling and inhibiting nuclear receptor FXR signaling, up-regulating glucagonogen gene expression, and promoting GLP-1 production and secretion38,39. Amino acid metabolism of gut microbiome has also been shown to be closely related to host metabolism. One study showed that indole-3-propionic acid (IPA, a flora metabolite of tryptophan) was significantly lower in patients with coronary artery disease (CAD); a combination of cellular experiments and experiments in IPA-supplemented/depleted mice demonstrated that IPA acted on macrophages to promote cholesterol efflux from macrophages by inhibiting miR-142-5p to induce ABCA1 expression, and therefore is protective against atherosclerosis40.The gut microbiota can also metabolize choline and L-carnitine from dietary sources (eg, red meat, eggs and fish) to produce trimethylamine (TMA). This gut microbiota-derived TMA is then absorbed and reaches the liver where it is converted into TMAO, a compound associated with an increased risk of cardiovascular diseases41. This evidence prompts further investigations on how fecal or blood metabolites associated with gut microbiome may reduce prediabetes risk.

To the best of our knowledge, no study has explored whether on a background of a portion-controlled, low-calorie diet, differences in dietary quality can additionally modulate metabolic health in a prediabetic Asian population. Here, we present a randomized controlled study to investigate the metabolic health effects of a food-based, portion-controlled intervention diet, consisting of legumes as the main protein and fiber source with low GI starches (i.e., rice, noodles etc.), prepared with a healthier vegetable oil blend as well as herbs and spices in comparison with an isocaloric, portion-controlled, control diet consisting of chicken as the main source of protein, with medium to high GI starches, prepared with corn oil and minimal herbs and spices. In addition to glycemic and cardiometabolic markers, this study also investigated changes in gut microbiome associated with adoption of either diet in a subset of the study population for a period of 16 weeks, with the hypothesis that our intervention can reduce the risk of T2D progression.

Results

Recruitment and study compliance

In this single-blinded, parallel-design study, participants with prediabetes were randomized to receive either a control diet or an intervention diet over 16 weeks. This was a partial feeding study where both the intervention and control groups had 2 of their 3 main meals replaced by the study intervention meals, which were provided to them. A CONSORT diagram detailing the flow of participants at different stages of the study is shown, resulting in 63 volunteers in the intervention group (IG) and 64 volunteers in the control group (CG) for intention-to-treat analysis; a subset of 54 volunteers in IG and 57 volunteers in CG for gut microbiome analysis (Fig. 1A). Throughout the study, plasma and stool samples were collected for various measurements at the indicated timepoints (Fig. 1B). At baseline, there were no significant differences in sex distribution, blood biomarkers, and anthropometric measurements between the two groups (Table S1). Both groups were on calorie-restricted diets and the nutritional composition of the diets were comparable except for a significantly higher amount of dietary fiber due to the inclusion of 100 g of cooked legumes per meal in the IG (Table S2). The reported mean compliance to the provided meals in completers based on the Intervention Food Records (IFR) were 98.7% and 99.3% for CG and IG, respectively (Table S3). No significant changes in physical activity and no adverse reactions in gastrointestinal symptoms reported, other than the expected increase in bloating from legumes consumption within the intervention group, was observed during the intervention period compared to the baseline (Table S4).

Fig. 1: Research framework diagram.
figure 1

A Consort Flow Diagram of the study outlining the participant recruitment, assessment for eligibility, randomization, intervention allocation, and follow-up phases, concluding with the analysis stage. B Study design: The dietary intervention was administered to both groups over a period of up to 16 weeks. Blood samples were collected at baseline, week 4, week 8, week 12, and week 16. Additionally, dietary records were maintained, and both anthropometric measurements and body composition tests were conducted. We also conducted OGTT at baseline, week 8 and week 16. Fecal samples were collected at baseline, week 2, week 4, week 8, week 12, and week 16. We profiled the fecal metagenome on all fecal samples and conducted targeted metabolomics of fecal and serum samples collected at baseline, week 4 and week 16. Lipid profile and glucose homeostasis biomarkers were assessed in all blood samples collected across five time points. For the metagenomic data, we undertook taxonomy annotation and functional characterization. The functional annotation encompassed the construction of non-redundant gene sets, annotation of carbohydrate-active enzymes, and annotation of functional pathways. Created in BioRender. Latypov, O. (2024) https://BioRender.com/w88p289.

Dietary change during intervention

The dietary assessment data demonstrated a significant reduction in energy intake over the course of the intervention and improvements in the intakes of several other nutrients, while maintaining protein adequacy in both groups (Table 1, Figure S1). These nutritional intake improvements included approximately 40% reductions in total and saturated fat intakes, a greater than 33% reduction in sodium intake in both groups as well as an increase in dietary fiber intake in the IG by about 17 g/day. The increase in dietary fiber intake in the IG was likely due to the greater consumption of legumes and low GI starchy foods, which may also explain the relatively lower intake of carbohydrate in the total diet in the IG (by approximately 10%) compared with the CG throughout the course of the intervention.

Table 1 Daily intake of energy and nutrients during dietary intervention. Data is presented as mean ± SEM

Changes in anthropometric measures during intervention

A significant effect of time was observed in both intervention and control groups for BMI, weight, hip, waist and neck circumference, total, android, and gynoid fat mass, android to gynoid fat ratio and total lean mass with all values decreasing significantly by the end of the intervention compared to baseline levels (Tables 2 and 3). Both intervention and control group lost on average -3.23 ± 0.18 kg ( ~ 5.1%) and -2.7 ± 0.22 kg ( ~ 4.3%) weight from their baseline weight to week 16, respectively. While there were reductions in both fat mass and lean mass, the total lean mass to total fat mass ratio significantly increased in both groups over time indicating greater relative reductions in fat mass compared to reductions in lean mass following the intervention. Body composition parameters using BIA other than body weight is reported in Table S5.

Table 2 Anthropometric measurements during the dietary intervention. Data is presented as mean ± SEM
Table 3 Body composition measurements during the dietary intervention measured using DEXA. Data is presented as mean ± SEM

Intervention diet improves blood lipids and glycemic indices in prediabetic participants

The temporal effect of glycated hemoglobin (HbA1c) differed across groups, with a statistically greater decrease in the IG at week 16 (q(Treatment×Time) = 0.04, β = -0.004, Fig. 2A). The change over time of fasting glucose (FG), insulin and HOMA-IR was comparable in both groups: insulin and HOMA-IR levels showed a decline over the study period while FG levels remained stable (Fig. 2B-D). The most significant changes in lipid profile were levels of blood cholesterol, including TC at week 4 (q(Treatment×Time) = 0.02, β = -0.3), high-density lipoprotein cholesterol (HDL-C) at week 4 (q(Treatment×Time) = 0.02, β  = -0.06) and low-density lipoprotein cholesterol (LDL-C) at week 12 (q(Treatment×Time) = 0.01, β = -0.16), decreased significantly from baseline in both groups but there were greater reductions in the IG (Fig. 2E-G). LDL-C and TC plateaued following a sharp decrease after week 4 (Fig. 2E, G). In contrast, HDL-C increased in a time-dependent manner after an initial decrease after week 4 (Fig. 2F). TC to HDL-C ratio and TG were not significantly different between the two groups (Fig. 2H-I). The 2-hour oral glucose tolerance test (OGTT) revealed no notable alterations in blood glucose, insulin, or fructosamine levels over time for both groups (Fig. S2A). Leptin levels significantly declined in both groups (q(Time) = 2.12 ×10-8, β  = -0.03), whereas adiponectin remained relatively stable over time in both groups (Fig. S2B).

Fig. 2: General response of the clinical outcomes during the study.
figure 2

Changes from baseline in (A) HbA1c, (B) fasting glucose, (C) insulin, (D) HOMA-IR, (E) LDL-C, (F) HDL-C, (G) TC, (H) TC/HDL-C ratio and (I) TG during the intervention are shown. Data presented as percent changes from week 0 (mean ± standard error). #q (Treatment×Time) < 0.05 denote p-adjusted values for the Treatment × Time interaction coefficient in the LMM model (two-sided; HbA1c: 0.04; LDL-C: W4 0.02; HDL-C: W4 0.02, W8 0.03, W12 0.01, W16 0.01; TC: W4 0.02); *q (Time) < 0.05, **q (Time) < 0.01, and ***q (Time) < 0.001 denote p-adjusted values for the time coefficient within the same group. p-values were adjusted for multiple hypothesis testing using the Benjamini-Hochberg procedure. Number of participants in IG over weeks: week 0, n = 63; week 4, n = 58; week 8, n = 57; week 12, n = 57; week 16, n = 57; number of participants in CG over weeks: week 0, n = 64; week 4, n = 61; week 8, n = 60; week 12, n = 59; week 16, n = 60. HbA1c: glycated hemoglobin A1c; HOMA-IR: Homeostatic Model Assessment of Insulin Resistance; LDL-C: Low-Density Lipoprotein Cholesterol; HDL-C: High-Density Lipoprotein Cholesterol; TC: Total Cholesterol; TG: Triglycerides.

Intervention diet alters the species composition of gut microbiome

From shotgun metagenomic sequencing, we identified 586 distinct taxa, revealed 524 functional pathways (via MetaCyc), and established a catalog comprising 14,386,339 unique microbial genes. Permutational multivariate analysis of variance (PERMANOVA) on gut microbiome composition revealed that the intervention diet and in particular dietary fiber intake were significant variables driving microbiome structure amongst others like LDL (Fig. 3A), these changes were visualized using a Principal Coordinate Analysis (PCoA) (Fig. 3B, C, Fig. S3A-B). A similar functional association was observed but dietary fiber was not significant (Figure S3C). Notably, both groups exhibited significant gut microbiota shifts from week 0 to week 2, with the IG showing a more pronounced change (p(Treatment×Time) = 0.001) and no further significant changes observed from week 2 to week 16 across the two groups (p(Treatment×Time) = 0.365) (Fig. 3D-E, Fig. S3D). However, we found no changes of the gut microbiota taxonomy diversity (Fig. S3F) and gene count (Fig. S3E) over time in either group.

Fig. 3: Taxonomic alterations of the human gut microbiome during dietary intervention.
figure 3

A Percentage of fecal microbiome variation explained by time, macronutrient, clinical indicators and anthropometric data by the PERMANOVA model (two-sided) based on Bray-Curtis distances in two groups separately. Red variable names represent significant Treatment × Time interaction effects. *p < 0.05, **p < 0.01, ***p < 0.001. (IG: Time 0.001, Fat 0.01, Energy 0.002, CHO 0.004, Fiber 0.002, Sodium 0.002, TC 0.001, HDL 0.005, LDL 0.001, OGTT 2 h glucose 0.027, OGTT 2 h glucose insulin 0.038, Neck 0.013, Pulse 0.049; CG: Time 0.025, BMI 0.005, Weight 0.007). CHO: Carbohydrate; HbA1c: glycated hemoglobin A1c; HOMA-IR: Homeostatic Model Assessment of Insulin Resistance; LDL-C: Low-Density Lipoprotein Cholesterol; HDL-C: High-Density Lipoprotein Cholesterol; TC: Total Cholesterol; TG: Triglycerides; FG: Fasting Glucose. BC Principal coordinate analysis of Bray-Curtis distances. The axes are labeled with the percent variance explained. R2 values and p-values were calculated from the two-side PERMANOVA test. B Colors are filled according to different groups; C Colors are filled according to fiber intake. D Microbiome pairwise dissimilarity between weeks in each subject based on Bray-Curtis distances. The central line in each box represents the median. The bounds of the box indicate the interquartile range (IQR), with the lower bound at the 25th percentile (Q1) and the upper bound at the 75th percentile (Q3). The whiskers extend to the smallest and largest values within 1.5 × IQR from the box. Data points beyond the whiskers are considered outliers. *p < 0.05, **p < 0.01, ***p < 0.001 denote two-sided T-test p-values performed on every Wx_y to W0_2, where Wx_y represents the Bray-Curtis distance between samples collected at Wx and Wy (W2_4: 0.002; W4_8: 0.021; W8_12: 0.0097; W12_16: 0.014). Number of participants in IG over W0_2, n = 54; W2_4, n = 54; W4_8, n = 54; W8_12, n = 53; W12_16, n = 53; number of participants in CG over W0_2, n = 55; W2_4, n = 55; W4_8, n = 57; W8_12, n = 55; W12_16, n = 55. E Percentage of fecal microbiome variation explained by different time periods in both groups. Two-sided PERMANOVA test, **p(Time) < 0.01, ***p(Time) < 0.001. (W0_W2: IG 0.001, CG 0.009; W2_W16: IG 0.004, CG 0.147). F The taxa that are significantly changed over time in intervention and control groups (q(Treatment×Time) < 0.15 and q(Time) < 0.05). The values represent the Z scores of averages of study weeks and groups calculated per taxa. Change trend: positive responding species are annotated in pink and negative responding species are annotated in green. Upper panel: genus; lower panel: species. G-H Scatterplot of correlation between dietary fiber intake and significantly changed species (G) and genera (H) across all samples. Coefficients and adjusted p-values are derived from two-sided LMM (Eubacterium rectale: q = 6.95×10-5, β = 0.17; Ruminococcus torques: q = 3.5×10-4, β = -0.15; Roseburia faecis: q = 2.7×10-8, β = 0.28; Ruminococcus lactaris: q = 4.7×10-4, β = -0.15; Roseburia hominis: q = 4.6×10-3, β = 0.17; Parabacteroides distasonis: q = 4.7×10-4, β = -0.13; Lachnospiraceae unclassified: q = 6.95×10-5, β = 0.17; Bifidobacterium: q = 4.0×10-3, β = 0.16; Bilophila: q = 4.0×10-3, β = -0.17). p-values were adjusted for multiple hypothesis testing using the Benjamini-Hochberg procedure. The smooth curve represents the trend line fitted using Linear Model, and the shaded area indicates the 95% Confidence Interval.

The alterations in the structure of gut microbiota prompted us to investigate which species responded most strongly to the intervention diet. Linear Models for Differential Abundance (LinDA) analysis identified 30 species (using a cut-off q (Treatment×Time) < 0.15, q(Time) < 0.05) and 15 genera (using a cut-off q (Treatment×Time) < 0.05, q(Time) < 0.05) that varied between the groups over time. As expected, these species remained relatively stable over time in the CG (Fig. 3F, Figure S4A-B). Of the species that did change, 8 showed a significant increase compared to the CG, referred as positive responding species. The rest exhibited significant or marginal decreases, referred as negative responding species compared to the CG. Specifically, those with significant increases were species belonging to the class Clostridia (Eubacterium rectale, Roseburia faecis, Roseburia hominis and) as well as the Bifidobacterium genus under the phylum Actinobacteria. On the other hand, three Ruminococcus genera (R. gnavus, R. torques and R. lactaris), a number of species belonging to the genus Bacteroides (including B. massiliensis, B. stercoris; except B. ovatus) and Bilophila (B. wadsworthia) decreased in relative abundance (Fig. 3F, Figure S4B). The enrichment or depletion of these species peaked by week 2, maintaining stability thereafter (Fig. 3F), consistent with the pattern we observed in the overall gut microbiota structure (Fig. 3D-E).

LMM further revealed that 13 out of the 30 responding species and 7 out of the 15 responding genera were significantly correlated (using a cut-off q < 0.05) with dietary fiber intake. No other nutrient was significantly associated with these changes in microbiota (Fig. S4C). The direction of these nutritional associations also corresponded with the changes observed during the intervention. For example, E. rectale and genus Bifidobacterium increased over time in the IG and showed positive correlations with dietary fiber, while R. torques, R. lactaris, and R. gnavus decreased and demonstrated negative correlations (Fig. 3G-H, Fig. S4C-D).

Intervention diet alters gut bacterial fermentation of carbohydrates

We then conducted a gene-centric analysis to explore the functional changes in the gut microbiota that might contribute to improved host clinical outcomes. The presence of dietary fiber in the meal enhanced the supply of fermentable carbohydrates, prompting us to concentrate on genes associated with carbohydrate utilization, specifically those encoding glycoside hydrolases (EC3.2.1). A total of 21 glycoside hydrolases were significantly different between the two groups over time; these trajectories were mainly changed in the IG (using cut-off q(Treatment×Time) < 0.05, q(Time) < 0.05, Fig. 4A). Enzymes under the class of glycoside hydrolases that were elevated are mainly responsible for the degradation of plant structural carbohydrates such as pectic polysaccharides (EC3.2.1.89, EC3.2.1.99), arabinoxylans (EC3.2.1.37, EC3.2.1.55) and xyloglucan (EC3.2.1.151, EC3.2.1.156), amongst other polysaccharides. In contrast, enzymes involved in the digestion of trehalose, lactose or short fructooligosaccharides (EC3.2.1.28, EC3.2.1.122) were decreased in the IG (Fig. 4A, Figure S5A). To further explore the hypothesis on fermentable carbohydrates, we analyzed genes that code for carbohydrate-active enzymes (CAZymes). This analysis revealed a total of 204,409 CAZy genes grouped into 323 CAZy gene families, corroborating a similar pattern to glycoside hydrolases. Among the CAZy genes, we found 52 that were significantly different between two groups (using a cut-off q(Treatment×Time) < 0.05). Those contributing to arabinoxylan, cellulose, pectin and starch degradation were significantly enriched, whereas those related to the use of mucin and agarose were reduced in the IG but not in the CG (Figure S5B).

Fig. 4: Alterations in gut bacterial fermentation of carbohydrates, SCFA profile and other functions.
figure 4

A The glycoside hydrolase genes that have significantly different progression profiles in the IG compared with the CG identified by LinDA (q(Treatment×Time) < 0.05 and q(Time) < 0.05). The values represent the Z scores of averages of study weeks and groups calculated per enzyme. B Correlation of blood glucose and lipid levels with intestinal and plasma SCFAs in two groups respectively. *q < 0.05, **q < 0.01, and ***q < 0.001. HbA1c: glycated hemoglobin A1c; HOMA-IR: Homeostatic Model Assessment of Insulin Resistance; LDL-C: Low-Density Lipoprotein Cholesterol; HDL-C: High-Density Lipoprotein Cholesterol; TC: Total Cholesterol; TG: Triglycerides; FG: Fasting Glucose. C Bubble chart of gene enrichment analyses for genes significantly changed in the IG. Size of circle denotes number of genes in the pathway, colored by adjusted p-values. D Changes in genes involved in each step of the histidine degradation pathway. Data presented as mean ± standard error. #q (Treatment×Time) < 0.05, ##q (Treatment×Time) < 0.01 denote q-values for the Treatment × Time interaction coefficient (hutH: W16 0.03; hutU: W16 0.008; hutI: W12 0.01, W16 0.002; hutG: W12 0.01, W16 0.002); *q (Time) < 0.05, **q (Time) < 0.01, and ***q (Time) < 0.001 denote q-values for the time coefficient within the same group. hutH: histidine ammonia-lyase; hutU: urocanate hydratase; hutI: imidazolonepropionase; hutG: formiminoglutamase. Number of participants in IG over weeks: week 0, n = 54; week 2, n = 54; week 4, n = 54; week 8, n = 54; week 12, n = 53; week 16, n = 54; number of participants in CG over weeks: week 0, n = 57; week 2, n = 55; week 4, n = 57; week 8, n = 57; week 12, n = 56; week 16, n = 57. E Changes in histidine degradation pathway abundance annotated by HUMAnN3. Data presented as mean ± standard error. #q (Treatment×Time) < 0.05 denote q-values for the Treatment × Time interaction coefficient; ***q (Time) < 0.001 denote q-values for the time coefficient within the same group. Number of participants in IG over weeks: week 0, n = 54; week 2, n = 54; week 4, n = 54; week 8, n = 54; week 12, n = 53; week 16, n = 54; number of participants in CG over weeks: week 0, n = 57; week 2, n = 55; week 4, n = 57; week 8, n = 57; week 12, n = 56; week 16, n = 57. F Contribution of species to the histidine degradation pathway.

In vitro validation showed larger increases in the populations of Eubacterium rectale, Ruminococcus bromii, and Bacteroides ovatus (Fig. S6) in the presence of undigested red bean powder. Notably, immediate post-inoculation assessments did not indicate any growth of the responsive species (except for Bacteroides ovatus) on the first day, likely due to an initial adaptation period. Nonetheless, the fiber-degrading bacteria demonstrated noticeable growth between the first and second day. In contrast, we did not see these changes in the presence of in vitro-digested or undigested brown rice powder.

To explore whether enhanced carbohydrate fermentation by the gut microbiota led to elevated SCFAs levels, we analyzed the absolute quantities of 10 SCFAs in fecal samples and 8 SCFAs in plasma. There were no significant differences in the changes of absolute SCFA concentrations in feces between the intervention and control groups (Fig. S7A). Among 8 types of SCFAs detected in plasma, the absolute amount of acetic acid was significantly higher in the IG (q (Treatment×Time) = 0.05, β  = 20.61, Figure S7B).

Further correlation analysis reinforced the connection between these SCFA levels and biomarkers. In the fecal samples of the IG, the absolute abundance of a number of SCFAs were positively correlated to HbA1c (using a cut-off q < 0.05), while in the CG only valeric acid was positively correlated with TG (using a cut-off q < 0.05). The concentration of plasma acetic acid was mainly seen as significantly negatively correlated (using a cut-off q < 0.05) with HOMA-IR, insulin, and TG in both groups. However, this pattern was only significant in the IG for HbA1c, LDL-C, TC and TC/HDL-C ratio (Fig. 4E, Fig. S8B).

Gut microbiota changes may improve metabolic outcomes via amino acid metabolism

For the 14,386,339 microbial genes, which was mapped to 7568 KEGG orthology. LinDA identified 356 significant temporal changes between the two groups (using a cut-off q (Treatment×Time) < 0.05) and had significant changes in the IG. Gene enrichment analysis on these 356 genes showed many pathways mainly comprising of carbohydrate and amino acid metabolism (Fig. 4C, Fig. S9B). MetaCyc aggregated pathway abundances also revealed numerous metabolic routes that had significant temporal shifts in the IG. Within these pathways, amino acid metabolic routes showed the most significant changes across time in the IG, in contrast to their stability in the CG (Fig. S9A). Intriguingly, these pathways also had notable correlations with HbA1c and certain lipids in the IG; these correlations were less evident in the CG (Fig. S10A).

Histidine metabolism is involved in producing 4-imidazolone-5-propanoate (IMP). Both the pathway and its corresponding genes were reduced in the IG (Fig. 4D-E), predominantly influenced by the genus Bacteroides, which were significantly reduced in the IG (Fig. 4F, Fig. 3F). Consistent to findings from the histidine pathway, both intestinal and plasma IMP concentrations showed modest declines in the IG (Wilcoxon p = 0.03), suggesting that diet-related improvements observed from the IG may be mediated through favourable modulation of IMP concentration (Figure S10B).

Intervention diet significantly alters fecal metabolome but has a minor effect on blood metabolome

We also explored the alterations in fecal and plasma metabolite profiles before and after dietary intervention, by performing targeted detection of a panel of 321 metabolites (Supplementary Data 1) in fecal and plasma samples. 195 and 193 metabolites were detected in the fecal and plasma samples respectively. Principal coordinate analysis revealed that the dietary intervention significantly impacted the fecal metabolite profile. In contrast, changes in plasma metabolite profiles remained comparatively subtle (Fig. 5A).

Fig. 5: Alterations in fecal and blood metabolite levels.
figure 5

A Principal coordinate analysis of fecal and plasma metabolome based on Bray-Curtis distances. R2 values and p-values were calculated from the two-sided PERMANOVA test. All three time points share the same axes, showing the first two principal coordinates and labeled with the percent variance explained. B, C Heatmap of q-values and beta coefficients of fecal metabolites (B) and plasma metabolites (C) across the whole study period. *q (Time) < 0.05, **q (Time) < 0.01, and ***q (Time) < 0.001 denote p-adjusted values for the time coefficient from the LMM model (two-sided) performed seperately for each group. p-values were adjusted for multiple hypothesis testing using the Benjamini-Hochberg procedure. See Supplementary Data 1 for full names of metabolite abbreviations.

At a broad metabolite classification level, metabolites classified as bile acids, carnitines, furans and phenylpropanoic displayed consistent change patterns, decreasing over time in the fecal IG (Figure S11A). In plasma, there was a marked increase in metabolites identified as indoles and benzenoids, which are derivatives of aromatic amino acids (Figure S11B). LMM revealed significant alterations on 58 intestinal and 15 plasma metabolites respectively over time (using a cut-off q(Treatment×Time) < 0.25, Fig. 5B-C). The 58 significantly altered intestinal metabolites, primarily consisting of amino acids and bile acids, aligned with the earlier observed shifts in metabolic functions predicted from gut microbiota genes. Specifically, intestinal amino acids and their derivative metabolites decreased in the IG, while remaining relatively stable in the CG.

Amino acids with differential temporal trajectories between the groups can be classified into three categories. The amount of branched-chain amino acids (leucine, isoleucine, and valine) decreased in the feces of the IG (Fig. 5B). In contrast, plasma branched-chain amino acids were reduced in both groups (Figure S12A). Similarly, levels of fecal aromatic amino acids (phenylalanine, tryptophan, and tyrosine) diminished over time in the IG relative to the CG. Plasma levels of aromatic amino acids decreased in both groups (Fig. S12B). We also found significant interactions between group and time for derivatives of amino acids. Those increasing in the plasma of the IG while remaining stable in the CG, include (1) indole-3-propionic acid (IPA) (Figure S12C), a reductive product of tryptophan formed by bacteria in the gastrointestinal tract; (2) methylcysteine and pipecolic acid, which are known to be products of legume digestion42 (Fig. 5C). In contrast,1-methylhistidine, a metabolite derived from meat, decreased in both the feces and plasma of the IG while remaining stable in the CG (Fig. 5B-C).

The temporal dynamics of bile acids were also significantly different between treatment groups with most intestinal bile acids downregulated over time in the IG. For example, intestinal bile acids of the lithocholic acid species decreased over time (12-ketoLCA, GLCA, isoLCA and LCA-3S) (Fig. 5B), but GLCA-3S was found to increase in the plasma in the IG (Fig. 5C). Intestinal bile acids of the hyocholic acid species (HCA and HDCA) decreased over time in the IG, but we did not detect any significant changes to hyocholic acid derivatives in plasma (Fig. 5B-C).

Metabolites act as mediators between alterations in microbiota and clinical outcomes

We subsequently investigated the significant correlations among microbiota alterations (species: n = 25, genera: n = 15), metabolite variations (fecal: n = 58, plasma: n = 15), and improvement in clinical outcomes (n = 15). Network analysis, performed separately within the two groups demonstrated that fecal metabolites exhibited more extensive associations with gut microbiome species and clinical indicators than plasma metabolites. Positively responding species E. rectale, R. faecis, R. bromii and B. ovatus were negatively correlated with lipid markers such as TC and LDL-C. HbA1c, however, was positively correlated with negative responding species (Fig. 6). This finding may imply that alterations in gut flora could be associated with enhanced lipid and blood glucose levels in individuals with pre-diabetes, potentially through diverse mechanisms. On the other hand, negatively responding species were positively correlated to many fecal metabolites. Within the network, B. wadsworthia, R. torques and genus Fusobacterium emerged as pivotal microorganisms with the highest degree of connections, while HbA1c, TC, and LDL-C stood out among the clinical indicators with the highest degree of connections, a correlation not observed in the CG (Fig. 6A, Figure S13A). In contrast, network analysis of plasma metabolites showed that only a few metabolites were associated with species and clinical indicators, including methylcysteine and pipecolic acid (Fig. 6B, Fig. S13B).

Fig. 6: Interactions between microbiota, metabolites and blood biomarkers.
figure 6

Network diagram illustrating correlations among taxonomy, metabolites, and clinical indicators at all time points in the IG. (A) for fecal metabolites; B for plasma metabolites. Network analysis involving significantly altered taxonomy (species and genera) and metabolites: displaying links with q(Treatment×Time) < 0.25 and q(Time) < 0.05. Node size reflects connection count and colors represent different data types. Line width between nodes signifies correlation strength, and colors indicate correlation direction. See Supplementary Data 1 for full names of metabolite abbreviations and The Human Metabolome Database (HMDB) numbers.

Bi-directional mediation analysis was performed to investigate the links between species, metabolites and clinical outcomes in both groups independently. We discovered 33 and 14 forward mediation linkages for the fecal and plasma metabolites respectively (Fig. 7A, 7C), and 15 and 9 reverse mediation linkages for fecal and plasma metabolites, respectively, specifically in the IG (Figure S14A-B). No significant mediation linkages were found in the CG. For fecal metabolites, most of these linkages were related to the impact of B. wadsworthia, R. hominis and R. gnavus on HbA1c and TC (Figure 7A). We observed that b-deoxycholic acid (bDCA) and norcholic acid (NorCA) could mediate the impact of B. wadsworthia and R. hominis on HbA1c, respectively. We also observed that 3−hydroxyisovaleric acid and oleylcarnitine could mediate the impact of B. wadsworthia and R. gnavus on TC, respectively (FDR forward mediation = 3.24 × 10−14, Fig. 7B). On the other hand, most plasma metabolite linkages were related to the impact of R. hominis and R. faecis on LDL-C, mediated by pipecolic acid, methylcysteine and GLCA-3S respectively (Fig. 7C-D).

Fig. 7: Mediation analysis identifies linkages between the species, metabolites and blood biomarkers.
figure 7

A Parallel coordinates chart showing the 33 mediation effects of the fecal metabolites that were significant at q < 0.05. Shown are taxonomy (left), fecal metabolites (middle) and clinical outcomes (right). The curved lines connecting the panels indicate the mediation effects, with colors corresponding to different metabolites. B Mediation effect of some highlighted fecal metabolites on the levels of HbA1c and TC. C Parallel coordinates chart showing the 14 mediation effects of the plasma metabolites that were significant at q < 0.05. Shown are taxonomy (left), plasma metabolites (middle) and clinical outcomes (right). The curved lines connecting the panels indicate the mediation effects, with colors corresponding to different metabolites. D Mediation effect of some highlighted plasma metabolites on the levels of LDL-C and TC. For (B) and (D), the grey lines indicate the associations between the two factors, with corresponding LMM coefficients and q-values. Direct mediation is shown by a red arrow and reverse mediation is shown by a blue arrow. Corresponding q-values from mediation analysis are shown. See Supplementary Data 1 for full names of metabolite abbreviations and HMDB numbers.

Discussion

Several RCTs have reported dietary intervention for managing prediabetes. For example, adherence to Mediterranean diet lowered HbA1c43,44; carbohydrate restriction led to improvement in several health parameters45; low calorie, low fat diet along with increased physical activity reduced diabetes conversion risk from pre-diabetics46,47; high protein diet led to remission of pre-diabetics48; long-term almond consumption led to better insulin responses49; etc.

Our RCT is one of the first studies in Asia exploring whether a multi-component dietary intervention on the background of calorie restriction can improve metabolic health in a population of prediabetes. Both hypocaloric intervention and control diets led to significant weight loss and improvements in body composition parameters over time. This, along with corresponding improvements in glycemic control and lipid profile, is consistent with prior observations on metabolic health benefits conferred by weight loss50,51,52. Furthermore, we showed that the intervention diet did result in relatively greater improvements in TC, LDL-C and HbA1c compared to the control diet. Simultaneously, alterations in gut microbiome were accompanied by functional changes in carbohydrate fermentation and utilization in the gut, resulting in metabolite changes (SCFA, amino acids and bile acids). We observed inter-relationships between the gut microbiome, metabolites and clinical outcomes, suggesting possible mechanisms for alleviating prediabetes.

Consumption of legumes has been shown to reduce TC, LDL-C and HbA1c, except in one study showing increased HbA1c53. However, the latter was done over a short 8-week period without a control group53. Meta-analysis54 show beneficial effects with legume consumption were seen in RCTs in the context of weight loss55,56,57,58 and without weight loss59,60,61,62,63,64. Beside soluble fiber, legumes also contain various phytonutrients known for their hypocholesterolemic effects65. In agreement with our results, a previous study reported that participants who consumed ~190 g of legumes/day had similar changes in TC and HbA1c21. Other works also reported significant reductions in FG and HOMA-IR in type 2 diabetics following a legume-based DASH diet compared to control DASH diet66,67. Notably, participants in both studies were on oral antihyperglycemic agents, which may have potentiated the benefits from the food-based interventions.

Whilst most studies highlight the importance of dietary changes in managing T2D risk, in general they did not elucidate the mechanistic underpinnings of how specific dietary modifications may impact the risk of progression to T2D in at-risk populations. More recently, it has been suggested that gut microbiome changes as a result of diet adaptation32,33,34,68,69,70 and may significantly influence glucose metabolism and insulin sensitivity, thereby playing a crucial role in the management of prediabetes and reducing the risk of T2D71,72,73,74. Therefore, our RCT additionally attempted to elucidate the links between a legume-rich intervention diet with corresponding changes in the markers of T2D risk as well as changes related to gut microbiome and microbial-related metabolites.

Previous studies showed that bacterial species, such as Eubacterium rectale, Roseburia faecis and genus Bifidobacterium, are altered by fiber interventions [15]75, promote the growth of SCFA producers28 and confer health benefits such as reducing diabetes risk76,77. Similarly in our study, greater reductions were observed in IG in species belonging to the genera Bacteroides and Ruminococcus, such as Ruminococcus torques, Ruminococcus gnavus and Ruminococcus lactaris. These species were previously linked to diet lacking in fiber and inversely associated with Mediterranean diet72,75. They have been reported to be related to intestinal health impairments and metabolic dysregulations78,79,80,81.

Given that gut microbiota comprises a wide range of metabolic functions that aids in the breakdown of complex nutrients and non-nutrients, species perturbations observed within IG may have influenced metabolic activity as predicted from metagenomic functions. Analysis of genomic representation of glycoside hydrolases and CAZy showed that the intervention diet led to an increased ability to hydrolyse plant cell wall glycans with a decrease in the ability to digest simpler carbohydrates. Genes involved in the SCFA production pathways were pertinent to our understanding of fiber degradation. Among the 4 SCFA pathways, lysine degradation genes were downregulated in IG, suggesting that SCFA production under the influence of a high fiber diet may be via acetyl-CoA, aminobutryl and glutarate pathways. Intriguingly, legumes are high sources of lysine, methylcysteine and pipecolic acid82. Pipecolic acid is also a lysine degradation product via the piperidine intermediate, previously reported to be associated with reduction of inflammatory cytokines83. Given that both methylcysteine and pipecolic acid concentrations were increased in the plasma metabolites but not lysine in IG, this may suggest that methylcysteine and pipecolic acid could be used as biomarkers for legume consumption. Conversely, 1-methylhistidine, which was downregulated in the feces and plasma of IG is recognized as a cardiovascular risk factor previously demonstrated to be substantially lower in vegetarians compared to non-vegetarians34. Taken together, these amino acid changes suggest protective effects against metabolic disorders.

Multiple studies have highlighted the detrimental effects of branched-chain amino acids (BCAAs) in the blood on metabolism84,85. One mouse study discovered that oral administration of Parabacteroides merdae (Pm) promoted the degradation of gut BCAAs into branched-chain short-chain fatty acids. This reduces blood levels of BCAAs, inhibits activation of mTORC1 pathway in arterial plaque macrophages, and exhibits anti-atherosclerotic effects86. Previous research also indicated that elevated blood concentrations of IPA, a derivative of tryptophan, are associated with reduced risk of CAD and T2D. Tryptophan metabolism also leads to the production of kynurenin, which modulates insulin sensitivity87. Furthermore, IPA has been shown to modulate intestinal barrier function through activation of the xenobiotic sensor, pregnane X receptor (PXR)40,88,89. Lastly, imidazole propionate (IMP), a derivative of histidine, is known to hinder insulin signaling through the activation of mTORC1 and inhibits metformin90,91. Reductions in both intestinal and plasma IMP observed in IG may be a mediator of the diet-related health outcomes.

Endogenous cholesterol is utilized to produce liver bile acids. 95% of bile acids are reabsorbed by intestinal villi and transported back via the hepatic portal vein. Reduced levels of intestinal lithocholic acid (LCA), deoxycholic acid (DCA), cholic acid (CA), together with decreased LDL-C and TC in IG may suggest reduced enterohepatic circulation of bile acids to the liver; inducing the liver to promote conversion of more cholesterol into bile acids. This results in the reduction of LDL-C synthesis. In the blood, only GLCA-3S was increased and may form an alternative explanation to the overall decrease in intestinal bile acids via gut microbiome modifications. GLCA-3S was previously shown to be correlated to dietary diversity92.

Network and causal mediation analysis further strengthened our findings by linking up the changes previously seen in the gut microbiome composition, metabolites with biomarker outcomes. Decreased B. wadsworthia in IG is correlated to a decrease in HbA1c, mediated by the decrease in intestinal bile acid bDCA. B. wadsworthia was reported to be enriched in T2D patients93. Likewise, increased R. bromii in IG is correlated to a decrease in LDL-C, mediated by an increase in plasma bile acid GLCA-3S. R. bromii was previously reported to be negatively correlated to apolipoprotein B-100, a carrier for LDL-C94. These mediations in our study may further link secondary bile acids and warrant further research.

The strength of this study was that the intervention dietary regime, where 2 out of 3 main meals per day were provided (contributing to ~60% total energy and >66% total protein intake) and the remaining dietary intake, was controlled in terms of total calorie intake with regular 4-weekly assessment and dietary consultations with a research dietitian. This has enabled greater control of diet quality, quantity, and allowed some degree of flexibility for such studies to be undertaken in a ‘free-living’ environment and facilitating longer-term adherence. The excellent retention rate in our study population and high reported compliance support this.

While the multicomponent nature of the intervention diet involving a hypocaloric diet alongside the use of legumes, blended oil, low GI foods and spices makes it challenging to pinpoint the contribution of individual components to the overall effect, nonetheless several recent reports have highlighted that dietary patterns are more effective than individual foods and nutrients95. One potential limitation in this partial feeding study design is the fixed dose of intervention meal components, which may lead to differences in the relative proportions of calories from non-intervention foods in different individuals, as their total caloric requirements can vary. However, this particular study design was deemed necessary to promote compliance to the demanding study protocol over an extended period (16 weeks) and to provide better external validity, since from a public health recommendation, this dietary advice will be easier to follow than to tailor specific intervention components according to individual body weight.

In summary, we demonstrated that a multi-component diet rich in legumes, low GI starches, healthy vegetable oils and spices can improve cardiometabolic health above and beyond effects that could be achieved through calorie restriction alone. This creates opportunities to design effective, longer-term food-based dietary interventions in Asia to reduce risks of T2D and CVD, especially in populations with greater predisposition to such diseases. We have also demonstrated that the favourable effects on the cardiometabolic biomarkers may be mediated through the composition and function of the gut microbiome, although some of these associations need to be further validated in other independent cohorts.

Methods

Study design and participants

This single-blinded, parallel-design clinical trial was registered at clinicaltrials.gov (identifier no. NCT04745702), approved by National Healthcare Group (NHG) Domain Specific Review Board ethics committee, Singapore (reference: C/2019/00997) and conducted in accordance with the Declaration of Helsinki 1983 and the Singapore Good Clinical Practice guidelines. The study was performed in prediabetic men and women of Chinese ethnicity in Singapore, aged between 45 to 75 years old with body mass index (BMI) between 19.5 to 32.0 kg/m2 who gave written informed consent. Sex of the participants was determined based on self-reporting. According to the American Diabetes Association (ADA) Guideline, prediabetic was defined as having one or more of the following criteria: (1) fasting blood glucose >5.5 mmol/l and <7.0 mmol/l, (2) oral glucose tolerance test (OGTT) level ≥7.8 mmol/l and ≤11.0 mmol/l, (3) Hemoglobin A1c (HbA1c) level ≥5.7 and ≤6.4%. The main exclusion criteria included smoking, food allergies or intolerances to common food ingredients, dietary restrictions, sustained hypertension ( > 160/95 mmHg), history of metabolic, cardiac, liver or kidney disorders, thyroid dysfunction and recent weight loss.

Recruitment was conducted via newspaper advertisements, by word of mouth as well as using the recruitment database of the research center [Clinical Nutrition Research Center (CNRC), Singapore]. Interested volunteers attended a Screening and Consent Visit following an overnight fast for assessment of study suitability using a Health and Lifestyle Questionnaire, anthropometric measurements as well as prediabetic screening based on the ADA criteria. The main measurements taken included volunteers’ height (Seca 217; Seca GmbH, Germany), weight (Tanita BC-418; Tanita, Japan), FG/OGTT level (HemoCue® 201; Radiometer, Denmark) and HbA1c level (DCA Vantage Analyser; Siemens, Tarrytown, USA). The 2-hour OGTT was performed using 75-g glucose solution according to American Diabetes Association (ADA)96. The trial was conducted from November 2020 to July 2021. All participants were reimbursed for their travel expenses and time.

Dietary intervention and assessment

Eligible volunteers were randomly assigned 2 intervention arms i.e. intervention and control. Randomization was done separately for each sex. The initial random assignment of one volunteer into each of the intervention arms was conducted using GraphPad prism and this is subsequently followed by minimization method in R. The randomization by minimization technique was applied in assigning volunteers to the treatment arms with a 1:1 allocation criteria. The BMI measured during the screening visit was used as the covariate to ensure the two intervention arms are balanced. The method used for randomization was developed in R using the methods suggested by Lin, Zhu97.

Throughout the 16-week period, volunteers replaced 2 main meals/day with 2 specially formulated meals provided as frozen, pre-cooked low-calorie ready meals (approximately 400 kcal each). Volunteers in the intervention group (IG) received meals consisting of legumes (mixed beans, red kidney beans or chick peas, 100 g per meal), a portion of meat analogues (textured vegetable soy proteins), a portion of vegetables, low GI rice or noodles, with added mixed herbs and spices, and prepared with blended vegetable oil98. Those in the control group (CG) were given meals consisting of a portion of meat (chicken or fish), a portion of vegetables, medium to high GI rice or noodles, with negligible herbs and spices, and prepared with refined corn oil. The meals in both groups were isocaloric. Additionally, male volunteers were provided with snacks containing either wholegrain cereals and nuts/seeds (IG) or refined cereal snack products (CG) to supplement their meals ( ~ 100 kcal/serving of snack). The total energy, macronutrient and sodium contents of the provided meals and snacks are shown in Table S2.

The remaining daily foods and snacks were ad libitum with individualized recommendations to limit caloric intake to induce a 5% weight loss in all volunteers (both CG and IG). These recommendations were given once every 4 weeks by a research dietitian. Advice given were non-biased but tailored to each volunteer’s estimated energy requirements (calculated using Oxford Equation99) with an estimated 500 kcal deficit. Only quantitative dietary advice on total caloric intake was provided to the volunteers with no qualitative recommendations provided. Volunteers were encouraged to retain their background habitual diet for the remainder of the daily foods and snacks and to maintain their habitual physical activity (PA) level throughout the study period. This was done to ensure no confounding factors from the involvement of research dietitian in the study.

Volunteers were required to attend measurement visits every 4 weeks with additional compliance checks every 2 weeks. Prior to each measurement visit, volunteers were required to complete a 3-day food diary for dietary assessment purposes. The food records were entered into the FoodWorks 10 Professional software system (version 10.0.4266, Xyris Software Pty Ltd, Queensland, Australia) and the volunteers’ estimated energy, macronutrients and sodium intake were calculated. Food databases within Nutritionist Buddy (nBuddy)’s app100 and individual foods and drinks’ Nutrition Information Panel (NIP) were also referred to, especially for local foods with no available data within databases in FoodWorks system. Random qualitative checks on the food record entry were performed by another researcher within the study team. To monitor volunteers’ compliance to the study meals, they were also asked to complete the Intervention Food Record (IFR) daily throughout the study period. Intervention Food Record was kept by each volunteer to monitor reported compliance to the dietary intervention. Every time the volunteers finished an intervention meal, they were instructed to note down the specific meal code within the appropriate column. From the record, we analyzed whether the participants consumed 2 meals/day as per protocol. If participant missed a meal, they were asked to write in their reasons under the ‘Remarks’ column.

Physical activity assessments

The physical activity level of volunteers was assessed using the International Physical Activity Questionnaire (IPAQ; long-form) when they attended measurement visits. Volunteers were asked to complete 27 questions in the IPAQ by recalling their activities in the past seven days, according to 5 categories: 1) job-related PA; 2) transportation PA; 3) housework, house maintenance, and caring for family; 4) recreation, sport, and leisure-time PA; and 5) time spent sitting101. IPAQ data was analyzed using the standardized IPAQ scoring protocol102 as metabolic equivalent (MET)-minutes per week. This was calculated by multiplying MET intensity with minutes of each activity throughout the seven days followed by averaging MET minutes per day. Incomplete IPAQ data for specific visits or data with a total PA exceeding 960 min were excluded from analysis. Input data that were between 0 and 10 min were rounded up to 10 min.

Clinical assessments

The primary outcomes for this study include several markers of glucose homeostasis including HbA1c, FG, fasting insulin and OGTT test results. The secondary outcomes were measures of lipid profile, fructosamine, and adiponectin. We also measured other outcomes like body composition, gut microbiome, gut and plasma metabolites. Clinical assessments were carried out after an overnight fast (minimum of 10 h) at 4-week intervals on Weeks 0, 4, 8, 12 and 16. During measurement visits, HbA1c point-of-care testing (DCA Vantage Analyser; Siemens, Tarrytown, USA), blood pressure (Omron HEM907, Japan) and various anthropometric measurements were taken. These include height using a stadiometer (Seca 217; Seca GmbH, Germany), body weight and body fat percentage using bioelectrical impedance (BIA) (Tanita BC-418; Tanita, Japan) and whole body composition by Dual-Energy X-Ray Absorptiometry (DXA) scan (QDR 4500 A, fan-beam densitometer; Hologic, Waltham, MA, USA) according to manufacturer’s instructions. Waist circumference was measured using a measurement tape at halfway between the lower ribs border and the iliac crest in a horizontal plane; whilst hip circumference was measured at the widest points just over the buttocks. Neck circumference was measured using the same measurement tape at mid-neck height, perpendicular to the neck’s long axis, below the laryngeal prominence (for male volunteers). The mid-upper arm circumference was taken using a measurement tape at the mid-point between the acromion process and the olecranon. Volunteers were required to remove all metal items before measurement to prevent any interference during the scanning process. Manufacturer provided the software (version 8.21) to generate and analyse the data. The “android region” spans the area between the ribs and the pelvis, while the “gynoid region” overlaps the leg and trunk, covering the hips and upper thighs. The software also estimated the visceral fat mass within the L1-L5 regions. Overnight fasted blood was also drawn by venipuncture (using needle and syringe) during each measurement visit. We calculated the HOMA-IR using the below formula:

\({{\rm{HOMA}}}-{{\rm{IR}}}=\left({{\rm{Fasting\; glucose}}}\times {{\rm{Fasting\; Insulin}}}\right)/22.5\)

Blood sampling and analyses

Blood samples were drawn into two different types of Vacutainer tubes (BD, USA) i.e. serum and EDTA tubes for primary and secondary outcomes analyses. Serum tubes were left to stand until the samples have clotted before centrifugation was carried out within 1 h of collection at 1500 × g for 10 min at 4◦C. EDTA tubes were centrifuged within 45 min of collection at the same settings. The supernatant from the samples were aliquoted into cryotubes and stored at −80 °C until required for analyses. Serum glucose, insulin, and blood lipid panel were analyzed using the standardized clinical chemistry method of the National University Hospital Referral Laboratories, Singapore. Glucose and blood lipid panel [including TC, LDL-C (direct measurement), HDL-C, and TG] were measured with an AU 5800 clinical chemistry analyzer (Beckman Coulter, Inc., USA) using an enzymatic colorimetric assay, and insulin was measured using UniCel DxI 800 Access Immunoassay System (Beckman Coulter, Inc.).

Adiponectin and leptin

Serum adiponectin and leptin levels were measured by enzyme-linked immunosorbent assay (ELISA), according to the instructions provided in commercially available kits (Human Total Adiponectin/Acrp30 Immunoassay, DRP300, Quantikine ELISA and Human Leptin ELISA Kit, RAB0333, Sigma-Aldrich).

Fecal sample collection

Volunteers who opted in for stool samples collection were instructed to do so at their own homes prior to arriving at the research center during each measurement visit at week 0, 2, 4, 8, 12 and 16. Collection of the stool samples was performed according to the manufacturer’s instructions using the provided kits. This included a DNA/RNA Shield fecal collection tube (Zymo Research, California, USA), the OMNImet.GUT kit (DNA Genotek, Ottawa, Canada) and BIOME-Preserve (Anaerobe Systems, California, USA) which preserves anaerobic microbiome communities at room temperature. The collected samples were transported in an insulated aluminium foil bag. Samples were aliquoted and stored in -80 °C when received in the laboratory.

DNA isolation and metagenomics sequencing

DNA was isolated from 450 μL of stool sample aliquots, preserved in DNA/RNA shield (Zymo Research). After treatment with 20 µL of 5 µg/mL lysozyme (Sigma-Aldrich), 20 µL of 10 mg/mL Proteinase K (Promega) and 10 µL of 10 mg/mL RNaseA (Novagen) at 37 °C for 15 min then at 60 °C for 15 min, samples were homogenized twice using Lysing Matrix E tubes (MP Biomedicals) at 6 m/sec, for 40 sec. After centrifuging, the supernatant was treated by 10 µL of 10 mg/mL RNaseA (Novagen) at 60 °C for 30 min. The mixture was purified by phenol: chloroform extraction (1:1) with subsequent DNA isolation using Maxwell® 16 FFS Nucleic Acid Extraction Kit (Promega) according to manufacturer protocol. For sequencing, PCR-free genomic DNA library was used for shotgun Illumina sequencing using Novaseq 6000, delivering at least 7 GB of data per sample (Novogene) (Supplementary Data 2). The workflow was validated by using ZymoBIOMICS Gut Microbiome Standard (Zymo Research) before large scale DNA isolation and sequencing.

Fecal and blood metabolome

Fecal aliquots (sample number: 333), preserved in OMNImet®•GUT(ME-200) kit (DNA Genotek) and blood aliquots (sample number: 332) of three time points (W0, W4 and W16) were used for metabolome analysis.

Metabolites extraction

Samples were thawed on an ice-bath to minimize sample degradation. 20 μL of plasma or fecal sample was added to a 96-well plate. The plate was then transferred to the Eppendorf epMotion Workstation. 120 μL ice-cold methanol with partial internal standards was automatically added to each sample and vortexed vigorously for 5 minutes. The plate was centrifuged at 4000 g for 30 minutes. 30 μL of supernatant was transferred to a clean 96-well plate, and 20 μL of freshly prepared derivative reagents was added to each well. The plate was sealed and the derivatization was carried out at 30  °C for 60 min. After derivatization, 330 μL of ice-cold 50% methanol solution was added to dilute the sample. The plate was then stored at -20  °C for 20 minutes followed by 4000 g centrifugation at 4  °C for 30 minutes. 135 μL of supernatant was transferred to a new 96-well plate with 10 μL internal standards in each well. Serial dilutions of derivatized stock standards were added to the left wells. Finally the plate was sealed for LC-MS analysis. To ensure quality control of the experiment, we prepared 31 QC samples each for fecal and plasma samples. The QC samples were equal-volume mixtures of the experimental samples.

HPLC-MS/MS analysis

An ultra-performance liquid chromatography coupled to tandem mass spectrometry (UPLC-MS/MS) system (ACQUITY UPLC-Xevo TQ-S, Waters Corp., Milford, MA, USA) was used to quantitate targeted metabolites using the N300 metabolomics kit in Novogene Co., Ltd. (Beijing, China). Samples were injected onto an ACQUITY UPLC BEH C18 1.7 µM VanGuard pre-column (2.1×5 mm) and an ACQUITY UPLC BEH C18 1.7 µM analytical column (2.1 × 100 mm) using a 18-min linear gradient at a flow rate of 0.4 mL/min for the positive/negative polarity mode. The eluents were eluent A (0.1% Formic acid-water) and eluent B (acetonitrile:IPA = 70:30). The solvent gradient was set as follows: 0-1 min (5% B), 1-11 min (5-78% B), 11-13.5 min (78-95% B), 13.5-14 min (95-100% B), 14-16 min (100% B), 16-16.1 min (100-5% B), 16.1-18 min (5% B). Xevo TQ-S mass spectrometer was operated in positive (negative) polarity mode with a Capillary of 1.5 (2.0) Kv, Source Temperature of 150 °C, Desolvation Temperature of 550 °C, Desolvation Gas Flow of 1000 (L/Hr).

Metabolites identification and quantification

The detection of the experimental samples using MRM (Multiple Reaction Monitoring) were based on Novogene self-built method. The Q1, Q3, RT (retention time), DP (declustering potential) and CE (collision energy) were used for the metabolite identification. The ratio of Q3 peak area of compound to internal standard was used for the standard curve. The concentration of the compound was calculated from the known internal standard concentration. The data files generated by UPLC-MS/MS were processed using the MassLynx Version 4.1 to integrate and correct the peak. For data below detection limits, k-nearest neighbors (kNN) was used for imputation. The list of metabolites is provided in Supplementary Data 1.

In vitro validation experiments

We performed in vitro experiments using fecal samples from six participants (three from the intervention group and three from the control group), which were introduced into a medium infused either with in vitro digested or undigested red kidney bean powder.

All preparation procedures and experiments were done under anaerobic conditions. To prepare inoculant, total six fecal samples, three from intervention and three from control groups, preserved in BIOME-Preserve Anaerobic Microbiome Collection Kit (Anaerobe Systems), were strained through the 100 µm nylon mesh (Fisherbrand). Amount of inoculant for each sample was adjusted to get final OD600 = 0.02 in the 10 mL of growth media (KH2PO4 10.9 g/L, K2HPO4 3.5 g/L, NaHCO3 2 g/L, NaCl 1.242 g/L, NH4Cl 2.5 g/L, MgSO4·7H2O 0.5 g/L, CaCl2 3 mM, FeSO4·7H2O 5 mg/L, MES 50 mM, mucin 4 g/L, Tween-80 0.25 mL/L, cholate 0.5 g/L, chenodeoxycholate 0.5 g/L, vitamin K1 10 µL/L, folate 2 mg/L, vitamin B12 0.1 mg/L, L-cysteine 0.3%). The media was degassed in an anaerobic chamber overnight before adding vitamins and L-cysteine; pH adjusted to 7.0. Each sample was growing in media containing 10 g/L of fiber originating from cooked digested or undigested red kidney bean or brown rice powder at 37 °C for 48 hours under anaerobic conditions. An in vitro digestion of substrates was carried out based on the INFOGEST international consensus protocol103. Changes in the microbiome community were analyzed at 0, 24 and 48 h post-inoculation. To assess reproducibility, some samples were done in duplicates.

Bioinformatics processing

From the raw metagenomics sequencing data, low-quality reads were discarded by the sequencing facility and reads belonging to the human genome were removed by mapping the data to the human reference genome (version NCBI37) using KneadData (v0.12.0)104 and Bowtie2 (v2.3.5.1)105. Microbial taxonomic profiles were generated using MetaPhlAn3 (v3.0.14). Microbial general pathways were determined using HUMAnN3 (v3.6.1)104, which maps DNA/RNA reads to a customized database of functionally annotated pan-genomes. HUMAnN3 reported the abundances of gene families from the UniProt Reference Clusters (UniRef90)106, which were further mapped to microbial pathways from the MetaCyc metabolic pathway database107. The abundance of gene families was then mapped to Enzyme Commission (EC) annotations already present in the HUMAnN database. The enzymes related to glycoside hydrolases were defined as all the enzymes that start with an EC number of EC:3.2.1 and analyzed further.

High-quality paired-end reads from each sample were used for de novo assembly with Megahit (v1.2.9)108 into contigs of at least 500 bp. Genes were predicted using MetaGeneMark (v3.38)109. A non-redundant gene catalog of 14,386,339 microbial genes was constructed with Cd-hit (v4.8.1)110 using the parameters “-c 0.95”. Salmon (v1.9.0) 111 was used to estimate the gene abundance for every sample. Non-redundant gene set were mapped to the Kyoto Encyclopedia of Genes and Genomes (KEGG) protein database112 to gene function annotation using KofamScan (v1.3.0)113, resulting in the annotation of 7,568 KEGG orthology, with genes annotated to the same orthology summed together. The reads were also aligned to curated databases of proteins involved in SCFAs (acetic acid, propionic acid and butyric acid) production114 using DIAMOND search (v0.9.30)115. Genes encoding carbohydrate-active enzyme (CAZys) were identified using dbCAN2116. Abundances of genes that belonged to the same CAZy family were summed together.

Statistical analysis

Data analysis was performed on an intention-to-treat basis without missing value imputation. Data was checked for normality visually using histograms, boxplots and quantile-quantile plots.

Sample size calculations

Power calculation for sample size estimation was based on a between group difference in HbA1c change of about 0.25% (conservative estimate) at the end of the trial from a previous study, which was also considered to be clinically meaningful48. The variation was assumed to be similar in our pre-diabetic population. With 80% power and alpha of 0.05, a total of 110 volunteers were needed. Taking into consideration an attrition rate of 13%, the final sample size required was 125. Sample size calculation was done using G*Power 3.1.9.2.

PERMANOVA analysis

To further evaluate the intervention effect on the microbiome composition over time, we used the “adonis2” function from the vegan package (version 2.6.4) in R (version 4.3.1) to construct PERMANOVA models. Each variable was tested separately with the “group x variable” interaction terms as independent variables:

$${Bray}-{Curtis\; distance} \sim {Treatment}*{Variable}$$
(1)

Additionally, we also constructed separate PERMANOVA models to assess each variable independently within each treatment:

$${Bray}-{Curtis\; distance} \sim {Variable}$$
(2)

In the above tests, each variable was tested independently to avoid issues with variable ordering, so that the variances explained by individual terms sum to total variance. Permutation procedure was stratified by subjects to account for longitudinal structure of the data with 999 iterations. To test whether the means of the groups were changed in PERMANOVA, we additionally performed beta dispersion test to ensure that it is not significant. To test whether the variance was contributed by earlier timepoints or later timepoints, we further performed PERMANOVA on subsets of the data based on time periods using the same formula above.

To ascertain the beta-diversity difference across time, we computed Bray-Curtis distance between samples from adjacent timepoints, e.g. Wx-y represents the Bray-Curtis distance between samples collected at Wx and Wy.

Linear mixed-effects model

To investigate both longitudinal primary outcome changes, we used Linear mixed effect models (LMM), for microbiome outcomes we used Linear models for differential abundance analysis (LinDA) from the MicrobiomeStat (version 1.2) package in R (version 4.3.1) which applies centered log ratio (CLR) transformation and corrects for biases due to compositional effects. Zero values were handled by imputation to avoid issues with pseudo-counts inflating the rate of false positive discoveries. Time was treated as a categorical variable for microbiome data, blood lipids and glycemic indices except for HbA1c, as changes were not assumed to be linear. We test the following hypothesis based on the formulas below:

To detect differences response between groups over discrete time, we used fixed and random components using the following formula:

$${Response} \sim {Treatment}*{Time}+{Sex}+{Age}+\left(1,|,{SubjectID}\right)$$
(3)

Within each group, to detect differences across time, we used the following formula:

$${Response} \sim {Time}+{Sex}+{Age}+\left(1,|,{SubjectID}\right)$$
(4)

To detect correlative longitudinal trends, we used fixed and random components using the following formula:

$${Response} \sim {Variable}+{Sex}+{Age}+\left(1,|,{SubjectID}\right)$$
(5)

p-values from LMM and LinDA were adjusted for multiple hypothesis testing using the Benjamini-Hochberg procedure. For differential longitudinal trajectories across groups, we calculate the q-values for the “Group:Time” interaction coefficients. For correlative trends, we calculate the q-values for the variable coefficients.

Casual Mediation Analysis

To investigate causal mediation effects, we used the mediation package (version 4.5.0) in R (version 4.3.1) and filtered the effects such that (1) the magnitude of the mediation effect to be in the same direction as total effect (ACME/ADE > 0), (2) control false discovery rate of the total effects (FDRtotal effect < 0.25), and (3) control false discovery rate of the forward or reverse mediation (FDRforward/reverse mediation < 0.05).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.