Introduction

Several lines of evidence suggest that breastfeeding is prospectively associated with lower risk of cardiometabolic disease both in women and their offspring1,2,3,4. However, the mechanisms underlying the associations remain to be elucidated. Prior studies suggest that breastfeeding leads to reduced weight gain and lower midlife adiposity5,6,7,8. Furthermore, two studies in women with a history of gestational diabetes mellitus (GDM) found that blood phosphatidylcholines (PC) and branched-chain amino acids (BCAA) were inversely associated with breastfeeding duration9,10. These metabolites have been linked to the pathophysiology of T2D in previous studies (PC to lower risk and BCAA to higher risk). Another recent study in 350 women with GDM with 8 years of follow-up reported that higher intensity of breastfeeding was associated with lower glycerolipid (TAGs/DAGs) and higher phosphor- and sphingolipid levels 6-9 weeks postpartum, but these changes were transient and only robust among women with GDM who did not subsequently develop T2D11. However, inference from previous studies is limited by small sample size, short follow-up periods from the last breastfeeding episode, and has focused on populations with GDM.

Metabolites are the downstream products of cellular activities regulated by the genome and modified by environmental factors, representing the confluence of genetic regulation, local environment, and enzymatic activity12 and have proven valuable in identifying perturbed metabolic pathways in disease pathophysiology13,14,15,16. While some metabolites are more transient, we and others have reported the long-term within-person stability of blood metabolites17,18,19,20, supporting the use of metabolomic biomarkers to elucidate disease pathophysiology and biological links between exposure-risk associations. With this, studies have investigated prospective associations between plasma metabolomic profiles in blood samples collected years prior to disease onset with risk of developing chronic diseases21,22. Additionally, recent studies have investigated blood metabolomic profiles related to health behaviors, such as healthy lifestyle and dietary patterns, which have been further linked with risks of CVD and T2D23,24,25. Although no studies to date have systematically examine the associations between breastfeeding-related metabolites and future risk of T2D or CVD.

We hypothesized that associations of breastfeeding with cardiometabolic risk are mediated through changes in systemic metabolic pathways and homeostasis, specifically higher levels of lysophosphatidylcholines and lower levels of branched-chain amino acids, which have been associated with breastfeeding among women with GDM and involved in T2D development9. To test this hypothesis, we investigated associations of total duration of breastfeeding across all pregnancies with plasma metabolites using high-throughput metabolomics in two well-characterized cohorts of middle-aged to older women (Nurses’ Health Studies, NHS & NHSII), and derived a metabolite-based breastfeeding score and replicated this score in an independent cohort of postmenopausal women (Women’s Health Initiative, WHI) (Fig. 1). Subsequently, we investigated the prospective associations of metabolite-based breastfeeding score with the risk of T2D and CVD (as a composite endpoint of coronary heart disease, coronary revascularization, and stroke). In addition, we externally replicated the associations of this metabolite-based breastfeeding score with cardiometabolic disease in the PREvención con DIeta MEDiterránea (PREDIMED) trial and the WHI cohort.

Fig. 1: Flowchart of inclusion and metabolomic score derivation methodology.
Fig. 1: Flowchart of inclusion and metabolomic score derivation methodology.
Full size image

We investigated associations of total duration of breastfeeding across all pregnancies with plasma metabolites using high-throughput metabolomics in two well-characterized cohorts of middle-aged to older women (Nurses’ Health Studies, NHS & NHSII), and derived a metabolite-based breastfeeding score and replicated this score in an independent cohort of postmenopausal women (Women’s Health Initiative, WHI). We further investigated the prospective associations of metabolite-based breastfeeding score with the risk of type II diabetes and cardiovascular disease. Results were externally replicated in the PREvención con DIeta MEDiterránea (PREDIMED) trial and the WHI cohort.

Results

Baseline characteristics for 159,684 NHS and NHSII participants included in the analysis of the association between self-reported breastfeeding and T2D and CVD are shown in the Supplementary Table S2, and the corresponding association analyses are presented in Supplementary Tables S3-4. Average age at baseline was 52.2 years in NHS and 35.8 years in NHSII. We documented 16,601 incident T2D cases over a period of 4,139,843 person-years and 15,488 incident CVD cases over a period of 4,197,137 person-years of follow-up in the NHS and NHSII. The baseline participant characteristics of the four cohorts included in the analyses for the associations between metabolite-based breastfeeding score and incident T2D and CVD are presented in Table 1. Average age at blood draw was 57.1 years in NHS, 44.8 years in NHSII, 67.0 years in WHI, and 67.8 years in PREDIMED. Participants in the NHS/NHSII were younger than participants in the other cohorts with 18% in the NHS and 78% in NHSII being premenopausal, while both WHI and PREDIMED consisted only of postmenopausal women. The distribution of women on postmenopausal hormones was similar in NHS and WHI, but only 2% of postmenopausal women in PREDIMED were on hormones at the time of blood collection. We documented 797 T2D and 613 CVD events in the NHS over a follow-up period of 20 and 24 years, respectively, 129 T2D and 81 CVD events in the NHSII over a period of 22 and 20 years, respectively, 1043 CVD events in the WHI and 138 T2D and 91 CVD events in the PREDIMED over a period of 3.8 and 4.7 years, respectively.

Table 1 Baseline participants characteristics from NHS, NHSII, WHI and PREDIMED cohorts

Self-reported breastfeeding and risk of T2D, CVD

Longer lifetime total duration of self-reported breastfeeding was inversely associated with T2D (12+ vs 0 months, pooled adjusted [aHR] 0.85 (95%CI 0.82-0.89), p-trend <0.0001) and CVD (12+ vs 0 months, pooled aHR 0.93 (95%CI 0.88-0.97), p-trend =0.009) risks in the fully adjusted Cox models. Additional adjustment for AHEI or socioeconomic status (SES) did not attenuate the estimates (Supplementary Tables S3-4).

Metabolite-based breastfeeding score

Characteristics of the NHS and NHSII (n = 4349) and WHI (n = 2088) participants included in the analyses of metabolomic score derivation and replication are presented in Supplementary Table S5. Average age at blood collection was 56.6 years in NHS, 44.4 years in NHSII, and 67.0 years in WHI with average total lifetime duration of breastfeeding being 13.6 months in NHS/NHSII and 1.9 months in WHI among those who ever breastfed. Out of 181 metabolites, 5 metabolites were selected for the metabolite-based breastfeeding score by elastic net regression (with mean squared error = 1.39): a group of 3 highly correlated triacylglycerols (TAGs C54:2, C56:2, C56:3), cotinine, and indole-3-propionate. The triacylglycerols and cotinine were inversely associated with longer self-reported duration of breastfeeding, while indole-3-propionate was positively associated (Fig. 2, and panels D–E), and the metabolites correlating with longer duration of breastfeeding were associated with lower risks of T2D and CVD (Fig. 2E). In addition, when we examined the internal correlations between these 5 metabolites, indole-3-propinate was inversely correlated with the other 4 metabolites in the NHS/NHSII, WHI and PREDIMED (Fig. 2, and panels A–C). In a sensitivity analysis, we recalculated the metabolite-based breastfeeding score coefficients among control participants and observed similar beta coefficients for the selected metabolites.

Fig. 2: Pairwise correlation matrix across 5 selected metabolites for the score in the Nurses’ Health Studies.
Fig. 2: Pairwise correlation matrix across 5 selected metabolites for the score in the Nurses’ Health Studies.
Full size image

A Women’s Health Initiative (B) and PREDIMED (C). Weights from elastic net regression for the metabolomic score (D and associations of metabolomic score components with history of breastfeeding, incidence of CVD and T2D (E) in the Nurses’ Health Studies. Selection of metabolomic score was based on 181 metabolites measured on C8-positive, HILIC-positive and HILIC-negative platforms among 4349 parous participants from NHS and NHSII cohorts using elastic net regression. Breastfeeding (BF) beta coefficients were estimated using linear regression from mutually adjusted model (all 5 metabolites entered in the same model) and represent the associations with longer self-reported breastfeeding; Cox proportional hazards regression models were used to estimate the beta coefficients of the associations between 5 selected metabolites with incident CVD and T2D. The models were adjusted for the following covariables at blood draw: age, fasting status, cohort (NHS, NHSII), endpoint and case-control status from the pooled 13 nested case-control studies. All statistical tests were two-sided, and did not adjust for multiple comparisons for this analysis. *p < 0.05, **p < 0.01, ***p < 0.001 after FDR-correction. Abbreviations: CVD, cardiovascular disease; NHS, Nurses’ Health Study; PREDIMED, Prevención con Dieta Mediterránea; T2D, type 2 diabetes. Source data are provided as Source Data file.

The metabolite-based breastfeeding score showed a modest but statistically significant and externally validated correlation with self-reported breastfeeding duration (r = 0.10, p < 0.0001 in the training set, r = 0.12, p < 0.0001 in the testing set, and r = 0.04, p = 0.046 in the external validation sets). When stratified by menopausal status and postmenopausal hormone use at blood draw, the score correlated with self-reported breastfeeding duration only among premenopausal women and postmenopausal women not on hormones (premenopausal women: r = 0.11, p < 0.0001, postmenopausal women not on hormones: r = 0.06, p = 0.05, as opposed to postmenopausal women on hormones: r = 0.01, p = 0.76). The same trend was observed in the WHI (postmenopausal women not on hormones: r = 0.05, p = 0.04, postmenopausal women on hormones: r = 0.03, p = 0.39). In a sensitivity analysis, we recalculated the metabolite-based breastfeeding score coefficients among control participants. The metabolite-based breastfeeding score developed including both cases and controls showed a correlation coefficient of 0.86 with the metabolite-based breastfeeding score developed including controls only in NHS/NHSII and 0.92 in WHI (Supplementary Fig. 1).

Metabolite-based breastfeeding score and T2D and CVD risk

In multivariable analyses adjusting for age, fasting status, race, age at 1st birth, pre-pregnancy BMI, family history of CVD or T2D, smoking status, alcohol intake, physical activity, parity, menopausal status and postmenopausal hormone and aspirin use at blood draw, the metabolite-based breastfeeding score was significantly inversely associated with incident T2D risk in all individual cohorts: in NHS (adjusted HR [aHR]=0.83[95%CI 0.76-0.90]), NHSII (aHR=0.55[0.46-0.65]) and PREDIMED (aHR=0.77[95%CI 0.61-0.99]; Fig. 3). When we meta-analyzed the estimates from the individual cohorts, 1 SD increase in the metabolite-based breastfeeding score was associated with 24% lower T2D risk (aHR=0.76[95%CI 0.71-0.82]). Significant heterogeneity across the 3 cohorts was observed with Cochran’s Q = 16.8, p = 0.0002, tau2 = 0.04. Additional adjustment for AHEI (aHR=0.77[95%CI 0.71-0.83]) or BMI at blood collection (aHR=0.84[95%CI 0.78-0.91]) did not change the associations with incident T2D risk in NHS/NHSII. Since cotinine is a metabolite of nicotine and therefore positively associated with cigarette smoking26, we conducted a sensitivity analysis examining the association between metabolite-based breastfeeding score excluding cotinine and T2D risk in NHS/NHSII and observed a significant inverse association with aHR=0.78 [95%CI 0.72, 0.84].

Fig. 3: Associations between metabolomic score for breastfeeding and risk of T2D and CVD in the NHS/NHSII, WHI, and PREDIMED.
Fig. 3: Associations between metabolomic score for breastfeeding and risk of T2D and CVD in the NHS/NHSII, WHI, and PREDIMED.
Full size image

A Associations with T2D were assessed in 3 cohorts (n total = 4720): Nurses’ Health Study (n = 2404), Nurses’ Health Study II (n = 1772), and PREDIMED (n = 544) using Cox proportional hazards models. B Associations with CVD were assessed in 4 cohorts (n total = 6792): Nurses’ Health Study (n = 2404), Nurses’ Health Study II (n = 1772), Women’s Health Initiative (n = 2088) and PREDIMED (n = 528), with T2D in 3 cohorts (n total = 4720): Nurses’ Health Study (n = 2404), Nurses’ Health Study II (n = 1772), and PREDIMED (n = 544) using Cox proportional hazards models. The forest plots display the hazard ratios (dot) and 95% confidence intervals (line). 181 HILIC-positive, C8-positive and HILIC-negative metabolites were used in the elastic net regression to select 5 metabolites for the metabolomic score of breastfeeding in the derivation cohort (NHS + NHSII). Model 1 adjusted for age, fasting status and stratified by endpoint, and case-control status. Model 2 additionally adjusted for race, age at 1st birth, pre-pregnancy BMI, family history of CVD/T2D, smoking status, alcohol intake, physical activity, parity, menopausal status and postmenopausal hormone and aspirin use at blood draw. All statistical tests were two-sided. Heterogeneity statistic for (A) T2D meta-analysis: model 1: Cochran’s Q = 25.8, p < 0.0001, tau2 = 0.05; model 2: Cochran’s Q = 16.8, p = 0.0002, tau2 = 0.04; for (B) CVD meta-analysis: model 1: Cochran’s Q = 5.19, p = 0.16, tau2 = 0002; model 2: Cochran’s Q = 5.21, p = 0.16, tau2 = 0.003. Abbreviations: BMI, body mass index; CVD, cardiovascular disease; NHS, Nurses’ Health Study; PREDIMED, Prevención con Dieta Mediterránea; T2D, type 2 diabetes; WHI, Women’s Health Initiative. Source data are provided as Source Data file.

With respect to incident CVD risk, metabolite-based breastfeeding score was suggestive of a slight inverse association in NHS (aHR=0.96[95%CI 0.87-1.05]), NHSII (aHR=0.95[95%CI 0.72-1.25]), WHI (aHR=0.84[95%CI 0.78-0.90]), and PREDIMED (aHR=0.86[95%CI 0.67-1.10]). The meta-analysis of the four cohorts resulted in a significant inverse association, with 1 SD increase in the metabolite-based breastfeeding associated with a 22% lower CVD risk (aHR=0.88[95%CI 0.84-0.93]). No significant heterogeneity was observed across the 4 cohorts used for CVD meta-analysis (Cochran’s Q = 5.21, p = 0.16, tau2 = 0.003; p-het=0.10 and 0.18 for model 1 and 2 respectively). Additional adjustment for AHEI score (aHR=0.97[95%CI 0.89-1.06]) or BMI at blood collection (aHR=0.99[95%CI0.90-1.08]) in the NHS/NHSII attenuated the association. Excluding cotinine from the metabolite-based breastfeeding score also attenuated the association (aHR=0.99[95%CI 0.91-1.08]).

Since women with a history of adverse pregnancy outcomes, such as GDM, preeclampsia, preterm birth, and pregnancy-induced hypertension, are at increased risk of T2D and CVD, we investigated the associations between the metabolite-based breastfeeding score and risk of T2D and CVD among this high-risk population in NHS/NHSII. Of the 2681 women who had a history of adverse pregnancy outcomes at time of blood draw, the metabolite-based breastfeeding score was significantly inversely associated with T2D risk (aHR=0.53[95%CI 0.44-0.64]) but not associated with CVD risk (aHR=0.94[95%CI 0.70-1.25]).

Discussion

Leveraging data from three large cohorts of > 6000 parous women, we identified and validated a metabolomic signature in plasma samples collected in mid-life or later life that was associated with longer lifetime duration of breastfeeding. Importantly, using the prospective data from the four large cohorts, we unraveled inverse associations of this metabolomic signature with future risks of T2D and CVD even after the adjustment for known cardiometabolic risk factors. The metabolomic signature highlights the potential metabolic pathways through which breastfeeding is associated with T2D and CVD in US and Spanish populations of women.

While there are multiple metabolomics studies published on T2D and CVD27,28,29, there is a paucity of metabolomics studies in breastfeeding. One study reported that breastfeeding for 3+ months among women with GDM was associated with changes in the metabolic profile that have been linked to the early pathogenesis of T2D, such as higher total lysophosphatidylcholine/total phosphatidylcholine ratio, lower leucine and total branched-chain amino acid concentrations9. Zhang et al. observed a shift away from the glycerolipid towards phosphor- and sphingolipid metabolism pathways among women with a history of GDM and greater exclusivity of breastfeeding as a potential mechanism underlying the metabolic benefits of breastfeeding in mothers11. In the same vein, we report inverse associations of self-reported breastfeeding with TAG components of the metabolomic score in our study. We also observed greater magnitude of risk reduction between the metabolite-based breastfeeding score and T2D risk among women with a history of adverse pregnancy outcomes, suggesting breastfeeding may have a differential effect, possibly a more beneficial effect among these high-risk population compared to women with average risk of T2D.

According to a recent systematic review leveraging evidence of metabolomics involvement in T2D, higher levels of indolepropionate, a tryptophan metabolite, were associated with lower T2D risk, after pooling hazard ratios from 8 prospective cohorts (pooled RR1-SD 0.82 95%CI 0.74-0.92)27. A conceivable explanation for the purported protective effect of indolepropionate against the onset of T2D may lie in two main factors. Firstly, its ability to influence the secretion of incretin hormones from enteroendocrine L cells, notably glucagon-like peptide (GLP)−1, which are known to be pivotal in the development of T2D30,31. Secondly, indole-3-propionic acid demonstrates significant antioxidative stress capabilities, indicating a potential role in safeguarding β-cells against damage caused by metabolic and oxidative stress, and potentially mitigating amyloid accumulation31. Recent prospective and cross-sectional studies have also reported inverse associations between indole-3-propionate and CVD32,33. One possible explanation could be that indole-3-propionate modulates the pregnane X receptor (PXR), a xenobiotic-activated nuclear receptor present in vascular endothelium, which asserts an anti-inflammatory effect and can induce vasodilation34.

Positive associations have been observed between the triglycerides included in our metabolomics score (TG 54:2 and 56:3) and incident T2D risk (pooled RR1-SD 1.42 [0.74-0.92] and 1.22 [1.08-1.39], respectively]27. Emerging evidence supports the link between triglycerides with low double-bond content and low carbon number, i.e., saturated and monounsaturated acyl chains, and increased risk of T2D35, possibly due to differential response of TAGs with low vs high double bond content to insulin activity and sensitivity, both acutely and over time. In the Framingham Health Study, TAGs of lower double bond content decreased in response to insulin action and were elevated in the setting of insulin resistance35. Similarly, in the context of CVD, TAGs with low double-bond content and shorter chain length, including TAG 54:2, were most consistently associated with higher CVD risk36,37, suggesting that the significance of specific TAG species in the context of cardiometabolic diseases might have been underestimated in previous research due to an undue emphasis on total triglyceride levels. Cotinine, a byproduct of nicotine metabolism, was selected into the metabolomic signature by elastic net regression likely due to the strong negative relationship between breastfeeding and smoking habits. However, removing cotinine from the metabolite-based breastfeeding score did not significantly change in the results for T2D risk but attenuated the association for CVD risk.

Cotinine, the primary metabolite of nicotine, is widely recognized as a biomarker for smoking38. However, a previous study showed that never smokers can also be exposed to cotinine through dietary sources such as potatoes, tomatoes, and eggplant39, suggesting that cotinine is influenced not only by smoking but also by diet. In our study, the association between the metabolite-based breastfeeding score and T2D and CVD risks remained significant even after adjusting for smoking status, further supporting the role of cotinine beyond smoking exposure. Additionally, a sensitivity analysis excluding cotinine from the breastfeeding score yielded results consistent with our main analysis, reinforcing the robustness of our main findings.

The plasma metabolome serves as a reflection of the overall metabolic balance influenced by various factors such as diet40, genetic variabilities41, the microbiome42, and health status43. It is possible that there may be misclassification in the self-reported breastfeeding. However, such misclassification is likely to be non-differential and while the metabolome can be changed by breastfeeding, it is expected to be independent of reporting errors. Furthermore, we observed similar positive correlations in the validation dataset. As we observed similar effect estimates additionally adjusting for BMI at blood collection, the observed associations between the metabolite-based breastfeeding score and risk of T2D and CVD seem to operate beyond adiposity at time of metabolomic profiling. It is possible that hormonal changes due to breastfeeding may have in part led to consequent changes in the metabolic pathways resulting in reduced risk of T2D and CVD. Breastfeeding leads to elevation in prolactin levels44, which has been associated with decreased risk of T2D45. Circulating prolactin levels have wide effects on glucose metabolism, inversely correlated with triglycerides and positively correlated with HDL-cholesterols46. Low prolactin levels have been associated with higher insulin resistance and beta-cell dysfunction46,47,48. Further mechanistic research is warranted to elucidate the underlying biological mechanisms linking breastfeeding and risks of T2D and CVD.

Our study has several strengths. First, leveraging multiple US and one Spanish cohorts with comprehensive covariate data and long-term follow-up enhanced the geographic diversity of the sample increasing the generalizability and validity of the results. Second, CVD endpoints were adjudicated by dedicated committees strengthening the validity of this critical outcome. Third, methods utilized to perform the metabolomic analysis have shown to be reproducible17 and the metabolomics data were generated at the same laboratory for all cohorts. We were also able to replicate our metabolite-based breastfeeding score and its subsequent associations with T2D and CVD risk in external replication cohorts. Another strength of our analysis was the consideration of the metabolite-based breastfeeding score and its correlation with self-reported breastfeeding stratified by menopausal status at blood collection. However, several limitations merit consideration. Firstly, this analysis was limited as we focused solely on the 181 named metabolites with HMDB IDs from 3 untargeted metabolomics platforms (HILIC-positive, HILIC-negative, and C8-positive platforms) which were measured in > 4000 parous women in NHS/NHSII to maximize sample size. It is possible that other metabolites may mediate the observed associations between lifetime total breastfeeding duration and risks of T2D and CVD, which, however, does not dismiss validity of our findings. Future structural annotation of currently unidentified peaks may help identify new biomarkers associated with breastfeeding. Secondly, the elastic net regression assumed linear relationships between breastfeeding and metabolites, overlooking potential nonlinear relationships or interactions (product-terms) between metabolites. Despite the robust performance of the metabolomic signature, incorporating additional metabolites and advanced machine learning techniques considering nonlinear relationships or interactions could enhance the approach. We examined plasma metabolic profile at a single time point and acknowledge that given the dynamic nature of the human metabolites, repeated metabolomic assessments using blood samples collected at multiple timepoints may better inform the changes in metabolite profiles related to breastfeeding duration. However, we previously reported that the majority of the measured metabolites exhibit reasonable within-person stability over short and long periods of time17,18. Specifically, the 1-2 years and 10 years ICCs for the metabolite selected into the score had ICC > 0.4, similar plasma cholesterol (10-years ICC = 0.39), a well-established CVD risk marker. Additionally, we acknowledge that evaluating plasma metabolites may not represent the metabolite profiles in the local tissue but rather system wide profiles which are more relevant to T2D and CVD development. Although we observed statistically significant correlations between the metabolite-based breastfeeding score and self-reported breastfeeding duration, the correlations were weak, and the significance are likely driven by the large sample size. However, we were able to replicate the significant positive correlation in an independent dataset (WHI) supporting the validity of the developed metabolite-based breastfeeding score. The model satisfied monotonicity and showed no evidence of gross misspecification, supporting the use of modeling breastfeeding duration as continuous. However, we selected metabolites that are predictive of breastfeeding duration modeled as a continuous variable, and therefore it is possible that metabolomic biomarkers that are non-linearly associated with breastfeeding duration are not selected. Due to the observational design, causality could not be firmly established. While the strength of this study is utilizing data from multiple cohorts with long-term follow-up, the metabolite-based breastfeeding score was developed using blood samples collected more than two decades ago. Given the potential change in lifestyle patterns over the past decades which may influence blood metabolite levels, the metabolite-based breastfeeding score presented in our study may not be generalizable in more recent cohorts. To examine health outcomes after long-term follow-up, there is always the dilemma that exposures were assessed decades prior to the outcome of interest. However, the underlying biological mechanisms and link should not change over time. Thus, it remains critical to investigate long-term impacts of exposures and our study provides unique data presenting the potential biological link between breastfeeding and future T2D and CVD development. We acknowledge that we were not able to investigate breastfeeding intensity as we did not have full detailed information on exclusive breastfeeding for all the participants. Average total duration of breastfeeding was fairly short in our study population, and therefore we were also only able to look at 12 months or more as the highest exposure category which may limit the generalizability of our findings and might underestimate the true effect of breastfeeding on cardiometabolic conditions. Majority of the study participants were white participants, limiting the generalizability of the study. Although cross-population reproducibility of the signature was assessed, further validation in racially and ethnically more diverse populations and exploration of associations with other chronic diseases are warranted. We acknowledge that the datasets used for the current analyses differ in study population including geographic location, study design, and detailed definition of the outcome. However, the strength of our study is that we observed similar direction of associations in these different datasets which supports the reproducibility and validity of our findings. We acknowledge that the CVD outcome definitions differed across studies. Since we only had blood metabolomics data at one timepoint, we were not able to account for potential time-varying effects of blood metabolite levels which could differ by those who are at risk vs. those who are not at risk of T2D or CVD. As our data includes participants 51 years old on average at blood collection with blood samples collected > 10 years after one’s breastfeeding exposure, this allowed us to investigate the potential systemic long-term impact of breastfeeding and its subsequent associations with T2D/CVD risk. We were not able to examine the breastfeeding-associated plasma metabolite signature in a younger cohort with blood samples collected more proximal to their last breastfeeding exposure, as this was beyond the scope of this study. Studies in younger cohorts are warranted to understand the short-term systemic impact of breastfeeding. We acknowledge the potential limitation of imputing missing values below the limit of detection with the half the minimal value for that metabolite. Lastly, individuals who birth a surviving infant for whom to provide breastmilk, who is healthy enough to breastfeed, and who has social support to breastfeed may be a fundamentally different individual than those who do not and/or cannot breastfeed. We were unable to control for these important factors given the available data but acknowledge this limitation.

In summary, based on consistent findings across multiple independent cohorts our study demonstrates that longer lifetime total breastfeeding duration is associated with a metabolite-based breastfeeding score of which consists of plasma metabolites measured during mid-life including C54:2 triglyceride, C56:2 triglyceride, C56:3 triglyceride, cotinine, indole-3-propionate. The metabolite-based breastfeeding score was associated with lower risk of T2D and CVD in women beyond adiposity at time of metabolomic profiling. Further investigation of the underlying biological pathways of the constituent metabolites will deepen our understanding of the biological mechanisms linking breastfeeding to cardiometabolic health.

Methods

Study populations

Primary analyses were performed in the prospective Nurses’ Health Studies (NHS/NHSII). The NHS was initiated in 1976 enrolling 121,700 female nurses aged 30-55 years49. The NHSII was established in 1989 enrolling 116,429 female US nurses aged 25–42 years50. In both cohorts, mailed questionnaires were administered biennially to assess reproductive and lifestyle factors and health status, with follow-up rates > 90%. Blood samples were collected from 32,826 NHS participants between 1989–1990 and 29,611 NHSII participants between 1996–1999 using standard protocols51,52. Within the NHS/NHSII, 13 nested case-control studies were previously conducted for blood metabolomic profiling (Supplementary Table S1)25,53. After excluding participants who are nulliparous, missing metabolomics data and breastfeeding status, a total of 4349 parous participants remained for the derivation of metabolite-based breastfeeding score. For the prospective T2D and CVD risk analyses, the baseline was set as the respective blood draw date for each participant. For these prospective analyses, participants with prevalent T2D, CVD, or cancer at baseline were additionally excluded, leaving a total of 2404 participants in the NHS and 1772 in the NHSII. We also conducted a subgroup analysis restricting to those with history of adverse pregnancy event (i.e., gestational hypertension, preeclampsia, GDM; n = 2681) and examined the association between metabolite-based breastfeeding score and T2D and CVD risk. The study protocol was approved by the institutional review boards of the Brigham and Women’s Hospital and Harvard T.H. Chan School of Public Health, and those of participating registries as required. The present study used anonymized data that were originally collected and therefore considered non-human research by IRBs.

External replication was performed for T2D risk in a nested case-cohort study within the Prevención con Dieta Mediterránea (PREDIMED)54 study, and for CVD risk in a nested case-control study of coronary heart disease within the Women’s Health Initiative (WHI)53 and in a nested case-cohort study within the PREDIMED study.

The WHI study enrolled 161,808 U.S. postmenopausal women aged 50 to 79 years from 1993 to 1998 in an observational study (WHI-OS) or one or more of three randomized controlled trials55,56. Participants completed baseline socio-demographic, diet, lifestyle, and medical history questionnaires. Plasma samples were collected at enrollment using EDTA tubes and processed immediately and stored in −70 °C freezers53. The current study utilized data from an ancillary 1:1 matched coronary heart disease case-control study nested within the WHI which included 2306 participants with blood samples 53. Women with a history of CVD and cancer at baseline, without metabolomic profiling, who were never pregnant or with missing breastfeeding status were excluded. A total of 2088 participants were included in the WHI replication analyses. The protocol was approved by the Fred Hutchinson Cancer Research Center Institutional Review Board, Seattle, WA. Written informed consent was obtained from all participants.

PREDIMED, a multicenter randomized controlled trial among individuals at high cardiovascular risk was carried out in Spain from 2003 to 2010 and examined the effects of the traditional Mediterranean diet on the primary prevention of CVD, with T2D as a secondary outcome. The primary outcomes have been published elsewhere54,57. Fasting plasma EDTA samples were collected at baseline and processed at each recruiting center no later than 2 h after collection and stored in −80 °C freezers58. The current study included 528 women for the prospective CVD analyses and 544 women for T2D analyses from two nested case–cohort (CVD and T2D outcomes) studies with metabolomics profiling. The IRB of Hospital Clinic (Barcelona, Spain) approved the study protocol and written informed consent was obtained from all participants. All participants provided written informed consent. The flow chart of included studies (primary and validation) and the analysis approach are presented in Fig. 1.

Plasma metabolite profiling

Plasma metabolomic profiling for NHS, NHSII, WHI, and PREDIMED was performed in the same laboratory at the Broad Institute of the MIT and Harvard (Cambridge, MA, USA) using a liquid chromatography-mass spectrometry (LC-MS) platform as described elsewhere17,25,53. We excluded metabolites that were unstable due to delayed processing17. Metabolites with missing values below the limit of detection were imputed with the half the minimal value for that metabolite, in each case-control study separately. We used the inverse-normal transformation within a case-control study to correct for batch effects and to scale metabolite to the same range. The final number of named metabolites available for all 4349 participants measured on the HILIC-positive, HILIC-negative and C8-positive platforms in NHS/NHSII was 181.

Assessment of lifetime total breastfeeding duration

In the NHS, breastfeeding history was assessed once, in 1986, when most of the women had completed their reproductive lifespan and were asked to report the lifetime total duration of breastfeeding for all pregnancies as a categorical variable: “cannot remember (considered as missing breastfeeding information)”, “did not breastfeed”, “<1”, “1–3”, “4–6”, “7–11”, “12–17”, “18–23”, “24–35”, “36–47”, and “≥48 months”. Participants in the NHSII reported their breastfeeding duration in 3 follow-up questionnaires. In 1993, 1997 and 2003, women reported their breastfeeding duration as the same categorical variables as in the NHS. The NHSII questionnaire in 1993 asked about lifetime breastfeeding history. The 1997 questionnaire asked detailed information about their breastfeeding history for each birth59. In 2003, for women who reported pregnancies subsequent to 1997 were asked to provide breastfeeding duration in a supplementary questionnaire. We used the breastfeeding history data from the questionnaire cycle that was most proximal prior to the blood collection. To allow harmonization with the NHS data, in NHSII we calculated the cumulative breastfeeding duration by summing the breastfeeding duration after each birth that the participants reported any breastfeeding prior to blood collection. We used the following categories of cumulative lifetime breastfeeding duration: 0 months, 1–6 months, 7–11 months, and 12+ months. Previous studies have demonstrated that both self-reported breastfeeding initiation and duration are highly reliable60,61. In the WHI, women who reported at enrollment having at least one live birth and who were not missing information on ever breastfeeding were included. Women were asked “Thinking about all the children you breastfed, how many months total did you breastfeed?” Responses were recorded as a categorical variable indicating a cumulative lifetime duration of breastfeeding: 0, 1–6 months, 7–12 months, and 13+ months. In the PREDIMED, no data on lifetime total breastfeeding duration were available.

Ascertainment of type 2 diabetes (T2D)

In the NHS, a supplementary questionnaire was mailed to women who reported physician-diagnosed diabetes on a baseline or any biennial questionnaire62. A validation study demonstrated a high level of confirmation (98%) of self-reported T2D63. In accordance with the National Diabetes Data Group64, diagnosed cases were required to meet one of the following criteria: a) an elevated glucose concentration (i.e., fasting plasma glucose ≥7.8 mmol/l, random plasma glucose ≥11.1 mmol/l or plasma glucose ≥11.1 mmol/l at 2 or more hours after an oral glucose load) and at least one symptom related to diabetes (i.e., excessive thirst, polyuria, weight loss or hunger); b) in the absence of symptoms, at least two elevated glucose concentrations on different occasions; and c) treatment with insulin or oral or other hypoglycemic medication. For cases of T2D identified after 1998, the revised American Diabetes Association criteria were applied using the fasting glucose cutoff of 7.0 mmol/L65.

In the PREDIMED trial, The adjudication of new diagnoses of T2D during follow-up was conducted by the Clinical End point Committee (blinded to the intervention group). The American Diabetes Association criteria, namely, two confirmations of fasting plasma glucose ≥7.0 mmol/L or 2 h plasma glucose ≥11.1 mmol/L after a 75 g oral glucose load, were used to adjudicate cases.

Ascertainment of cardiovascular disease (CVD)

In the NHS and NHSII, CVD was defined as a combined endpoint of non-fatal or fatal myocardial infarction (MI), stroke, coronary artery graft bypass surgery (CABG) or percutaneous coronary intervention (PCI). When a participant (or family members of deceased participants) reported an incident event, permission was obtained to examine their medical records by physicians who were blinded to the participant risk factor status. For each endpoint, the month and year of diagnosis were recorded as the diagnosis date. Non-fatal events were confirmed through review of medical records. MI was confirmed according to the WHO criteria66 on the basis of symptoms and diagnostic electrocardiogram changes or elevated cardiac enzymes. Strokes were confirmed according to the National Survey of Stroke criteria67 as a neurological deficit with sudden or rapid onset that persisted for > 24 h or until death. Deaths were identified by reports of families, the U.S. postal system, or using death certificates obtained from state vital statistics departments and the National Death Index and confirmed through review of medical records or autopsy reports. Follow-up for deaths was > 98% complete68.

In WHI, coronary heart disease (CHD) was defined as incident coronary heart disease defined as MI or death attributable to coronary heart disease53,69. CHD outcomes were adjudicated based on physicians reviewing the elements of medical history, electorcardiogram reading, and the results of cardiac enzyme/ troponin determinations. The controls were frequency matched on 5-year age, race/ethnicity, hysterectomy status, and 2-year enrollment window. Women included in the WHI dataset were drawn from a prior nested case-control study of plasma metabolomicsa and incident CHD, and all were free of CVD at study baseline53.

In PREDIMED, CVD was defined as a composite of MI, stroke, or cardiovascular death. Every year, four information sources were utilized by study physicians who were blinded to the intervention status to identify incident CVD cases: follow-up contacts with participants, family doctor contacts, an annual examination of medical records, and consultation of the National Death Index. A central Event Ascertainment Committee that was blinded received anonymized data and made the adjudication of the events.

Assessment of covariates

Information on potential risk factors, including medical, demographic, and reproductive histories, lifestyle practices, and body weight was collected and updated through NHS and NHSII biennial questionnaires. The exposure assessed closest prior to the blood draw was used in analyses. Parity was defined as the number of pregnancies lasting > 6 months and updated through follow-up questionnaires. As a surrogate pre-pregnancy body mass index (BMI) measure, BMI at age 18 years was calculated as self-reported weight (kg) of these specific time periods divided by the square of height (m2). Data on pre-pregnancy BMI was not available in the WHI and PREDIMED. In PREDIMED, family history of T2D and further details on parity and age at 1st birth were not available. Physical activity in metabolic equivalent (MET) hours were calculated based on participants reported average weekly time spent over the past year engaging in activities70. Alternate Healthy Eating Index (AHEI) included 11 components and was calculated based on Food Frequency Questionnaire with details demonstrated elsewhere71. Cigarette smoking status was self-reported as current, past, and never. Alcohol intake was self-reported as total grams of alcohol intake per day. Self-reported menopausal status was collected at blood collection: premenopausal, postmenopausal without hormone therapy, postmenopausal with hormone therapy use, and unknown.

Statistical analysis

This study aimed to comprehensively examine the relationship between plasma metabolites and breastfeeding duration, as well as its long-term impact on CVD and T2D risk. First, we developed a metabolite-based breastfeeding score in NHS/NHSII and replicated this result in WHI. We then assessed the association between the breastfeeding score and CVD/T2D risk the NHS/NHSII and replicated these results in two independent cohorts, WHI and PREDIMED.

First, in NHS/NHSII, we examined the associations between self-reported breastfeeding and risk of T2D59 and CVD72 adjusting for a priori confounders, which have been adjusted for in the previously published publication on breastfeeding and T2D or CVD4,59,72,73, including all women with available self-reported breastfeeding status (n = 159,684), irrespective of whether they had plasma metabolomic data. Cox regression models were used to calculate the hazard ratios (HRs) and 95% confidence intervals (CIs). Model 1 adjusted for age. Model 2 additionally adjusted for race (white participants vs people of color), family history of T2D or CVD (yes/no), pre-pregnancy BMI ( ≤ 25, 26-30, 31 + ), parity (1, 2, 3 + ), age at 1st birth ( < 25, 25-29, 30-34, 35 + ), and time-varying smoking status (never, past, current), physical activity, alcohol intake, menopausal status, and post-menopausal hormone use. Model 3 additionally adjusted for AHEI dietary score. Model 4 was model 2 additionally adjusted for SES score. The SES score included 9 variables: median household income, median home value, percent with a college degree, percent white people, percent Black people, percent of foreign-born residents, percent of families receiving interest or dividends, percent of occupied housing units, and percent unemployed74.

To derive a metabolite-based breastfeeding score, individual metabolite values were transformed to probit scores using the inverse normal transformation. Then, NHS/NHSII dataset which includes both incident cases and controls from 13 nested case-control studies within the NHS/NHSII (Supplementary Table S1), was split into training and testing (for internal validation) sets in a 70%:30% fashion. The elastic net regression was used to select the breastfeeding-specific metabolites in the NHS/NHSII training set within a 10-fold cross-validation framework and was then applied to the NHS/NHSII testing set to calculate the metabolite-based breastfeeding score, where lifetime cumulative breastfeeding duration (0 months, 1–6 months, 7–11 months, 12+ months) was treated as an a continuous linear outcome. A 10-fold cross-validation (CV) approach was performed to select the optimal λ (lambda) tuning parameter, to minimizing overfitting and determine the optimal beta coefficients in the linear regression75,76. The metabolite-based breastfeeding score was calculated as the weighted sum of the selected metabolites with weights equal to the elastic net regression coefficients. To identify a disease-agnostic breastfeeding score and to avoid overadjustment, we did not adjust for T2D and CVD risk factors in the score development. However, these were accounted for when examining the associations between metabolite-based breastfeeding score with risk of CVD and T2D. As a sensitivity analysis, we performed LASSO and ridge regressions to select breastfeeding-specific metabolites following the same analytical framework as for elastic net regression. Pairwise correlation across metabolites was performed using Pearson correlation within each cohort. We examined the correlations between the individual metabolites selected in the metabolite-based breastfeeding score using elastic net in each of the three cohorts to evaluate the consistency of correlation patterns across the three cohorts. We also examined the correlations between the metabolite-based breastfeeding score and the self-reported lifetime total duration of breastfeeding using the Pearson correlation coefficient in the training, testing, and external validation (WHI) sets.

We then examined the associations between the metabolite-based breastfeeding score and incident T2D and CVD by multivariable Cox regression in NHS/NHSII. The score was standardized to a z-score to interpret the associations per 1-SD unit increment. The Cox regressions were stratified by case-control status in the original sub-study and adjusted for age and fasting status (model 1). Model 2 was additionally adjusted for age at first childbirth, pre-pregnancy BMI (continuous), race (white participants or people of color), smoking status (current, past, or never), /physical activity (continuous), alcohol intake (continuous), family history of disease (CVD for CVD endpoint and T2D for T2D endpoint), parity, menopausal status, and postmenopausal hormone therapy and aspirin use at baseline. Additionally, we performed sensitivity analyses adjusting the main model for the alternative healthy eating index (AHEI) score (continuous) and BMI at blood collection. As metabolites have been shown to be affected by postmenopausal hormone use77, we 1) derived the metabolomic score among postmenopausal women only to examine the robustness of the metabolite selection, and 2) stratified the correlation analyses between metabolomic score and self-reported duration of breastfeeding by menopausal status (premenopausal women, postmenopausal women on hormones, postmenopausal women not on hormones). The person-time for each participant was calculated from the blood collection date (baseline) until the date of CVD/T2D or end of follow-up (CVD: June 2022 in the NHS and June 2019 in the NHSII; T2D: June 2018 in the NHS and June 2021 in the NHSII), whichever came first. We conducted external replication of the association between the metabolite-based breastfeeding score calculated using the weights from the NHS/NHSII and T2D risk in PREDIMED and CVD risk in WHI and PREDIMED. Subsequently, we conducted random-effect meta-analysis of the risk estimates across 4 studies (NHS, NHSII, WHI, PREDIMED). Between-study heterogeneity was explored by τ2 and I2 statistics. Analyses were performed using R version 4.2.0, SAS 9.4 for UNIX (SAS Institute Inc), and Stata v16.0. All statistical tests were two-sided. To account for multiple comparisons, we used the Benjamini-Hochberg procedure and controlled the false discovery rate (FDR) < 0.05.

Further information on the research design is available in the GATHER checklist linked with this manuscript.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.