Abstract
Background
Associations between premenopausal plasma metabolites and breast cancer incidence are largely unknown.
Methods
We conducted a prospective, matched case-control study in which we measured pre-diagnostic metabolomic profiles among predominantly premenopausal women in the Nurses’ Health Study II (n = 2010). Lipids, carbohydrates, and organic acid-related metabolites (n = 218) were profiled via liquid chromatography-tandem mass spectrometry. Conditional logistic regression was used to estimate odds ratios (OR) for associations between individual metabolites and breast cancer incidence. Associations with metabolite groups were assessed using metabolite set enrichment analysis (MSEA).
Results
Six individual lipid-related metabolites were nominally associated with breast cancer incidence (taurodeoxycholate [OR for per 1 standard deviation increase in metabolite level = 1.15, 95% CI = 1.04–1.28]; C16:1 cholesteryl ester [OR = 0.88, 95% CI = 0.79–0.97]; three phosphocholine (PC)-related metabolites, C34:1 PC [OR = 0.87, 95% CI = 0.78–0.98], C34:3 PC [OR = 0.88, 95% CI = 0.79–0.98], C32:1 PC [OR = 0.88, 95% CI = 0.79–0.98]; indoxyl sulfate [OR = 0.90, 95% CI = 0.82–1.00]). In MSEA analyses, triglycerides (TAGs) with <3 double bonds (normalized enrichment score (NES) = −2.54) and PCs (NES = −2.12) were inversely associated with breast cancer incidence overall and across subgroups. Phosphatidylethanolamine (PE) plasmalogens (NES = 1.83) and PC plasmalogens (NES = 2.23) were positively associated with breast cancer incidence.
Conclusions
Premenopausal plasma TAGs, PCs, and plasmalogen metabolites were associated with breast cancer incidence. Further validation in independent cohorts is warranted.
Similar content being viewed by others
Introduction
Breast cancer disease burden is considerable. It is the most common cancer diagnosed among women in 140/180 countries, including U.S. [1]. The etiology of breast cancer is complex, and strong evidence indicates that metabolic processes play a key role [2, 3]. Importantly, associations with established metabolic-related risk factors appear to vary based on menopausal status [4, 5]. For example, obesity is consistently positively associated with postmenopausal breast cancer and inversely associated with premenopausal breast cancer incidence [6, 7]. However, metabolic-related factors and biomarkers are still understudied, especially among premenopausal women. Metabolomics is the comprehensive analysis of small molecules in biological specimens [8]. It can systematically provide a functional readout of upstream changes (genetic, transcriptomic, proteomic) and can reflect potential interaction signals with environmental factors. Therefore, this powerful approach has the potential to offer new insights into the metabolic pathways involved in breast cancer development [5, 8].
Several epidemiological studies have evaluated associations between circulating metabolites and breast cancer incidence [5, 9,10,11,12,13,14,15,16,17,18,19,20,21,22,23]. However, most studies included only postmenopausal women or did not present results stratified by menopausal status. We previously published a study among predominantly premenopausal women in the Nurses’ Health Study (NHS) II that focused on the circulating amino acid and amino acid-related metabolites only. We observed that 2-aminohippuric acid, DMGV, kynurenic acid, phenylacetylglutamine, and piperine were inversely associated with breast cancer incidence, while creatine and C40:7 PE plasmalogen were positively associated with breast cancer risk [5]. Yet, associations with other metabolite classes, such as circulating lipids, carbohydrates, and organic acids, remain unexplored in NHSII. In this study, we aimed to investigate the associations between pre-diagnostic levels of these metabolites and subsequent incidence of breast cancer among women in the NHSII who were predominately premenopausal at blood collection.
Methods
Study population
The NHSII [24] was established in 1989 with 116,429 female registered nurses aged 25–42 years enrolled. Participants were followed by mailed questionnaires every 2 years to collect medical, reproductive, lifestyle, and dietary information. In 1996–1999, 29,611 NHSII participants aged 32–54 years contributed blood samples,18,521 of whom donated timed samples within the menstrual cycle, as previously described [25]. All samples were collected and shipped overnight to the Channing laboratory for further processing and archiving of blood cells and plasma aliquots in liquid nitrogen (<130 °C) freezers [25].
We conducted a matched case-control study nested within the NHSII blood sub-cohort. Eligible women participated in the NHSII blood sub-cohort and were free of cancer (except for nonmelanoma skin cancer) at the time of blood collection. Incident breast cancer cases were identified after blood collection and before 2012. The follow-up in the blood sub-cohort is high (96% in 2011) [25, 26]. The median interval from blood collection to diagnosis was 9 years. Breast cancer cases were reported by the participant and then confirmed by medical record reviews [5, 10]. Controls were selected via risk-set sampling (i.e., incidence density sampling with matching on time). Specifically, one control who was still at risk (not diagnosed with breast cancer) and under follow-up at the time of the case’s diagnosis was individually matched to each breast cancer case on the following factors ascertained at blood collection: age, month, time of day, race/ethnicity, fasting status, luteal day (for samples timed in the menstrual cycle), hours since last meal, and combined menopausal status and postmenopausal hormone use. Additional details on the selection of cases and controls are provided in the Supplemental Methods.
Ethics approval and consent to participate
The study protocol was approved by the institutional review boards (IRB No.1999P003389) of the Brigham and Women’s Hospital and Harvard T.H. Chan School of Public Health, and those of participating registries, as required. The informed consent was implied by participants’ return of the questionnaires and blood samples. All study methods were performed in accordance with the relevant recognized ethical guidelines (Declaration of Helsinki).
Metabolites profiling
Plasma metabolites were profiled at the Broad Institute of MIT and Harvard (Cambridge, MA) using a liquid chromatography-tandem mass spectrometry (LC-MS) method [27,28,29]. Briefly, for the C8-positive platform, plasma lipids were profiled using a Nexera X2 U-HPLC (Shimadzu Corp.; Marlborough, MA) coupled to a Q Exactive Plus mass spectrometer (Thermo Fisher Scientific; Waltham, MA). Lipids were extracted from plasma (10 µL) using 190 µL of isopropanol containing 1,2-dodecanol-sn-glycerol-3-phosphocholine (Avanti Polar Lipids; Alabaster, AL). After centrifugation, supernatants (2 µL) were injected directly onto a 100 ×2.1 mm, 1.7 µm ACQUITY BEH C8 column (Waters; Milford, MA). MS analyses were carried out using electrospray ionization in the positive ion mode using full scan analysis over 200–1100 m/z. Lipid identities were denoted by the total acyl carbon number and the total number of double bond numbers. For the HILIC-negative platform, HILIC analyses of carbohydrate and organic acids metabolites in the negative ionization mode were conducted using an LC-MS system comprised of a Nexera X2 U-HPLC (Shimadzu Corp.; Marlborough, MA) coupled to a 5500 QTRAP mass spectrometer (Thermo Fisher Scientific; Waltham, MA). Other procedures were similar to the C8-positive platform. Our previously analyzed amino acids and derivatives, which were included in the presented grouped metabolites analysis, were measured through the HILIC-positive ionization mode as previously described [5].
Pooled reference samples were included once for every 20 samples, and results were standardized using the ratio of the value of the sample to the value of the nearest pooled reference, multiplied by the median of all reference values for the metabolite. All samples were run together, with matched case-control pairs (as sets) distributed randomly within the same batch, and the order of the case and controls within each pair randomly assigned. Therefore, the case and its control were always directly adjacent to each other in the analytic run, thereby limiting variability in platform performance across matched case-control pairs [5]. Given this, no additional batch corrections were performed for this present breast cancer case-control project. In addition, 238 quality control (QC) samples, randomly distributed among the samples, were profiled [5]. After excluding broken case-control pairs (n = 5), 1055 cases and 1055 matched controls measured on the C8-positive and HILIC-negative platforms were included in our analytical set.
Covariate information
In order to estimate associations independent of established breast cancer risk factors, we incorporated information on several covariates. Information regarding participant demographic characteristics, reproductive history, medical history, smoking history, weight, height, and physical activity was self-reported and updated on the biennial follow-up questionnaires and at blood collection [25, 26]. Body mass index (BMI, kg/m2) was calculated using height (m) reported at baseline and weight (kg) reported at blood collection. Tumor estrogen receptor (ER) expression status was evaluated by immunohistochemistry (IHC) on validated tumor microarrays when possible or extracted from medical records if IHC data were not available [30].
Statistical analysis
Selection of metabolites
We first excluded metabolites that were not stable with the delayed processing inherent in the blood collection methods (n = 43) [28]. Moreover, 16 metabolites had ≥10% missing values and were excluded from the main analysis, because these metabolites may not have been sufficiently well-measured. We also excluded non-lipid metabolites included in our previously published paper in the same case-control set (n = 9); we did not exclude lipid metabolites included in our previously published paper (n = 40), given the focus on lipid metabolites in this analysis [5]. In total, 218 metabolites (C8-positive = 177; HILIC-negative = 41) were included in the primary analysis. All these metabolites exhibited good reproducibility (intraclass correlation coefficient [ICC] range 0.59–1.00) within-person over 1–2 years [28]. Most of the metabolites (n = 210) had a coefficient of variation (CV) < 25% and an ICC > 0.4 among QC samples. Of these included metabolites, 152 had no missing values, and 66 metabolites had <10% missing values. Missing values were imputed by one-half of the lowest observed value per metabolite.
Analysis of associations
We conducted two types of association analyses: 1) an analysis of individual metabolites and 2) an analysis of groups of metabolites.
In individual metabolite analyses, conditional logistic regression models were used to estimate odds ratios (ORs) and 95% confidence intervals (CIs) for associations between per 1 SD increase in each standardized metabolite level and breast cancer incidence. We fit three different models to investigate the extent to which associations were independent of breast cancer risk factors. In the first model, we did not include additional covariates. In the second model, we adjusted for BMI at age 18 and weight change from age 18 to the time of blood draw. In the third model, the following additional breast cancer risk factors were included: age at menarche, parity, age at first birth, breastfeeding history, family history of breast cancer, personal history of benign breast disease, physical activity, alcohol consumption, and oral contraceptive use.
In analyses of grouped metabolites, we first performed a metabolite set enrichment analysis (MSEA) based on the 218 metabolites that we analyzed individually. The MSEA combined the results from individual metabolites’ logistic regressions by pre-defined groups (annotated at the Broad Institute) to generate a normalized enrichment score (NES) adjusted for metabolites group size [31]. The NES represents the level to which the metabolite set is over-enriched compared to other groups; a positive NES represents a positive association with breast cancer, whereas a negative score indicates a group that is negatively associated with breast cancer [9, 31]. We also evaluated grouped metabolites using a weighted gene co-expression network analysis (WGCNA) approach as the secondary analysis. Here, we also included amino acid-related metabolites that were included in our prior publication[5]. In total, there were 381 metabolites measured across three platforms (C8-positive, HILIC-positive, and HILIC-negative) included in the WGCNA analysis [32]. We followed the methods described in detail previously [9]. Briefly, a metabolite co-expression network is created using data in the controls. The metabolites in the network were clustered using a measure of network proximity. A soft-thresholding power of 3 was selected based on scale-free topology criteria. Modules were then identified using hierarchical clustering and dynamic tree cutting, with a minimum module size of 10 and a merge threshold of 0.25 for closely related modules. Each module was assigned a module score based on the loading on the first principal component of each constituent metabolite, also derived among the controls [9, 32]. Here, the module score is a weighted linear combination of the metabolites included in the module. Module scores were then included in multivariable-adjusted conditional logistic regression models, which were used to estimate associations between each module and incident breast cancer [9, 32].
We also estimated associations within subgroups defined at baseline, including 1) restricting to premenopausal women at blood collection and 2) stratifying by baseline BMI (<25 vs. ≥25 kg/m2) and by ER status. In addition, we performed three sensitivity analyses to assess the potential influence of medical and dietary factors, as well as fasting and recency status of blood samples: 1) further adjusting for comorbidities (hypertension and high cholesterol), dietary fat intake, carbohydrate intake, and glycemic index; 2) stratifying by fasting status; and 3) stratifying by median time from blood collection to breast cancer diagnosis (≤6.5 vs. ≥6.5 years) to evaluate whether the associations between identified biomarkers and breast cancer risk varied by proximity to diagnosis.
For each association described above, we reported a continuous P-value. We interpreted these values as one piece of evidence within the totality of evidence, including point estimates and confidence intervals. In addition, because investigators are sometimes concerned with potential chance findings when many statistical comparisons are conducted, we estimated adjusted P-values that accounted for multiple comparisons using two approaches. For the individual metabolite analyses, we estimated the number of effective tests (NET) as 137 using the standard method [33], which accounts for the correlation structure among metabolites to estimate the number of independent tests, based on the eigenvalues of the correlation matrix of metabolite features. The adjusted P-values were further calculated as: Padj = Punadjusted/NET. For the grouped metabolites analyses, we estimated False Discovery Rate (FDR)-adjusted P-values based on the q-value procedure [34]. Metabolites with NET-adjusted p-value < 0.0004 (0.05/137 = 0.0004) and metabolite sets that met FDR-adjusted p-value < 0.05 were considered statistically significant. All statistical tests were two-sided.
Results
As shown in Table 1, 1055 cases and 1055 matched controls were included in this analysis. The mean age was 45 at blood collection (SD: 4.5), and 77% of women were premenopausal. Cases and controls were generally comparable for most of the characteristics, though cases drank more alcohol, and were more likely to have a personal history of benign breast disease and a family history of breast cancer.
Six individual metabolites were nominally associated with breast cancer incidence (Table 2 and Supplementary Table 1) across all three models. From the fully adjusted model (model 3), the bile acid taurodeoxycholate, was positively associated with breast cancer risk (OR = 1.15; 95% CI = 1.04–1.28). Remaining metabolites were all inversely associated with overall breast cancer risk, including C16:1 cholesteryl ester (CE) (OR = 0.88; 95% CI = 0.79–0.97), three phosphocholine (PC)-related metabolites (C34:1 PC OR = 0.87; 95% CI = 0.78–0.98; C34:3 PC OR = 0.88; 95% CI = 0.79–0.98; C32:1 PC OR = 0.88; 95% CI = 0.79–0.98), and indoxyl sulfate (OR = 0.90; 95% CI = 0.82–1.00). While the direction and magnitude of associations were generally consistent with the main analysis after restricting the study population to premenopausal women at the time of blood collection (Supplementary Table 2), the associations did not remain statistically significant for C16:1 CE and C32:1 PC (Supplementary Table 3). For subgroup analyses by BMI and ER status, associations for the nominally significant metabolites appeared to vary across subgroups (Supplementary Table 2). For example, among women with BMI < 25 kg/m2, associations were strongest for were PC- and CE-related metabolites. Among women with BMI ≥ 25 kg/m2, associations were strongest for triglycerides (TAGs) with ≥3 double bonds (DBs) (inverse associations). For ER+ tumors, TAGs with <3 DBs were the top nominally significant ones (inverse associations), while for ER- tumors, three organic acid-relevant metabolites (alpha-keto isovalerate, 2-hydroxyglutarate, indole acetate) were the top ones (positive associations). Individual metabolite results were similar after restricting to women with fasting blood samples and additionally adjusting for comorbidities, dietary fat intake, carbohydrate intake, and dietary glycemic index. The results were also largely consistent across strata defined by time from blood collection to breast cancer diagnosis (≤6.5 years vs. >6.5 years), suggesting that the associations are robust to timing of blood collection and proximity to diagnosis (Supplementary Table 4).
MSEA identified several classes of metabolites significantly associated with breast cancer incidence after FDR correction (Fig. 1 and Supplementary Table 5). In the fully adjusted model (Fig. 1), TAGs with <3 DBs (normalized enrichment score (NES) = −2.54; Padj < 0.00001) and PCs (NES = −2.12; Padj = 0.002) were significantly inversely associated with the incidence of overall breast cancer. PC plasmalogens (NES = 2.23; Padj < 0.001) and PE plasmalogens (NES = 1.83; Padj = 0.02) were positively associated with breast cancer. After stratifying by ER status and BMI and after restricting to premenopausal women at blood draw (Fig. 2), inverse associations with TAGs with <3 DBs were still evident across all subgroups. Interestingly, some classes exhibited associations in opposite directions across different subgroup. For example, TAGs with ≥3 DBs were significantly positively associated with breast cancer among women with BMI < 25 kg/m2 (NES = 2.58; Padj < 0.001) but inversely associated among those with BMI ≥ 25 kg/m2 (NES = −2.35; Padj < 0.001) (Supplementary Table 5). Enrichment patterns of other classes, for example, PC plasmalogen, PE plasmalogen, and CE, differed by ER status. Similar findings were also observed in the stratified analyses by fasting blood status and by median time since blood collection.
Stars denote P values adjusted by FDR: *Padj < 0.2; **Padj < 0.05; ***Padj < 0.001. Darker blue represents a more negative enrichment score (inverse association with breast cancer); darker red represents a more positive enrichment score (positive association with breast cancer). Model 1: Unadjusted conditional logistic regression model; Model 2: Model 1+ BMI at age 18, weight change (from age 18 to time of first blood draw); Model 3: Model 2+ age at menarche, parity and age at first birth, breastfeeding history, family history of breast cancer in a first degree relative, personal history of benign breast disease, physical activity, alcohol intake (by quintile), and oral contraceptive use at blood collection. DB double bound, FDR false discovery rate.
Stars denote P values adjusted by FDR: *Padj < 0.2; **Padj < 0.05; ***Padj <0.001. Darker blue represents a more negative enrichment score (inverse association with breast cancer); darker red represents a more positive enrichment score (positive association with breast cancer). The model for the subgroups is based on the unconditional logistic regression model accounting for matching factors and with further adjustment for BMI at age 18, weight change (from age 18 to time of first blood draw), age at menarche, parity and age at first birth, breastfeeding history, family history of breast cancer in a first degree relative, personal history of benign breast disease, physical activity, alcohol intake (by quintile), and oral contraceptive use at blood collection. DB double bound, FDR false discovery rate.
Nine metabolite modules were identified from the WGCNA analysis (Table 3, Supplementary Tables 6 and 7, and Supplementary Fig. 1) based on the complete set of 381 metabolites across three profiling platforms. Although these modules were not purely defined by a particular class, most modules had one or a few leading classes. For example, module 2 (M2) was characterized by TAGs <3 DBs, which explained 42% of the variation in metabolite levels (Table 3). Modules were modestly associated with overall breast cancer; however, associations were stronger among certain subgroups (Supplementary Table 7). For example, with ER+ breast cancer, module 2 (characterized by TAG < 3 DBs, OR = 0.83; 95% CI = 0.71–0.97), module 5 (characterized by TAG ≥ 3 DBs and amino acids, OR = 0.80; 95% CI = 0.68–0.94), and module 9 (characterized by PC and sphingomyelins, OR = 0.82; 95% CI = 0.70–0.96) were inversely associated with incidence.
Discussion
In this prospective, nested case-control study of predominantly premenopausal (73%) women, we observed that several lipid metabolites from various subclasses were associated with breast cancer incidence. Notably, TAGs with <3 DBs and PCs were inversely associated with breast cancer incidence, both overall and across different subgroups. In contrast, PC plasmalogens and PE plasmalogens were positively associated only with overall breast cancer incidence. The associations of TAGs with ≥3 DBs with breast cancer significantly varied by BMI, and differential associations with PE plasmalogen, PC plasmalogen, and CE were observed by tumor ER status.
TAGs are composed of a glycerol backbone and three fatty acid chains [35]. The fatty acids can have varying degrees of saturation, which refers to the absence (saturated) or presence (unsaturated) of DBs between carbon atoms in the fatty acid chain [35]. We did not observe a strong association between total TAGs and breast cancer. However, classifying by DB number, we observed that TAGs with <3 DBs (n = 30, 21% DB = 0, carbon atom range 43–56) were strongly inversely associated with breast cancer incidence. Contrary to this finding, we previously reported that TAGs with <3 DBs were positively associated with breast cancer incidence in an NHS nested case-control study, in which all women were postmenopausal at blood collection [9], which suggests potential heterogeneity of associations by menopausal status. Other relevant studies did not have specific TAG subclass measurements available, or results were not published [11, 12, 16,17,18,19, 23], which precludes direct comparisons. While some studies have explored the potential association between blood lipid levels (including TAGs) and breast cancer incidence, findings are conflicting, as summarized in a recent meta-analysis [36].
Our findings on TAG with <3 DBs are also in contrast with the observations from type 2 diabetes studies. Rhee et al. found that TAGs of lower carbon atom number (range 44–50) and DB (range 0-3) content were associated with an increased risk of diabetes in the Framingham Heart Study [37]. This association appeared to be independent of metabolic factors and was confirmed in a recent meta-analysis, with a meaningful trend of higher T2D risk with lower DBs in TAGs [38]. Saturated TAGs (DB = 0) were also reported in our previous work to be associated with lower diet quality and higher red meat and trans-fat intake [39, 40], whereas long-chain TAGs were linked to healthier dietary components (e.g., nuts, whole grains) [39]. In contrast, in the PREDIMED Trial, an opposite finding was observed, where TAGs with <3 DBs and odd-chain were inversely associated with T2D [41]. It has been shown that TAGs with fewer DBs are associated with insulin resistance, and insulin resistance often accompanies metabolic dysregulation (hyperinsulinemia, sex hormone imbalance, chronic inflammation, obesity, etc.) [42]. It is possible that the mixture effects of these pathways are influencing breast carcinogenesis differently among premenopausal women (vs. postmenopausal women), and this is consistent with BMI being inversely associated with breast cancer risk in premenopausal women and the opposite for postmenopausal women. In addition, there is some suggestion from our prior work that metabolic dysregulation-related metabolites (e.g., DMGV and branched-chain amino acid) were inversely associated with breast cancer in premenopausal women [5, 10]. More work is necessary to extend and validate our findings.
We also identified several important associations between PCs (n = 21, carbon atom range 30–40, DB range 0–10) and plasmalogens (n = 24, carbon atom range 30–40, DB range 0–7) and breast cancer incidence. These metabolites are subclasses of glycerophospholipids that play a crucial role in cell membrane structure and cell signaling [35]. For PCs, our finding of an inverse association with breast cancer was consistent with several cohort studies, including the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort [16, 43], SU.VI.MAX cohort [19], and Cancer Prevention Study (CPS) III [22]. However, levels of PC ae C30:0 were associated with increased breast cancer in EPIC-Heidelberg [23]. The role of PCs in carcinogenesis is not fully understood – our observed inverse association could be related to their anti-inflammatory properties, protection from oxidative stress, and reduction of cell proliferation [44]. For example, His et al. found that better adherence to a healthy lifestyle was associated with higher levels of several PCs [13]. In our prior publication, higher levels of metabolites in PC class were also associated with adiposity [45], as well as higher levels of physical activity [46], carotenoid intake [47], coffee consumption [48], and overall diet quality [39], while being inversely associated with red meat and soda consumption [40, 49, 50].
Similarly, plasmalogens have also been shown to have anti-inflammatory and/or antioxidant functions [41]. While we identified two plasmalogens (head groups are either PC or ethanolamine (PE)) that were positively associated with the incidence of breast cancer, other cohorts of predominantly postmenopausal women, including EPIC, CPSII, and CPSIII identified inverse associations between these metabolites and breast cancer [11, 16, 22]. These findings are challenging to interpret. One hypothesis is that, given that PC and PE plasmalogens comprise ~20% of total human membrane phospholipid [51, 52], higher levels of plasmalogen may indicate changes in the formation and maintenance of cellular membranes [5]. Perturbed membrane metabolism may lead to a corrupted surrounding tissue microenvironment and promote cell proliferation and dissemination [53]. In previous NHS publications, plasmalogens with fewer double bonds were associated with lower overall diet quality and higher intake of red meat and trans fats [39, 40, 49, 54].
Interestingly, we found that the association of TAGs with ≥3 DBs and breast cancer incidence differed by BMI, with a higher incidence among women with normal BMI, but a substantially lower incidence among overweight and obese women. In this same cohort, we previously reported significant inverse associations between erythrocyte membrane n-3 polyunsaturated fatty acids (PUFA) and breast cancer that was only observed among overweight/obese women in the NHSII [55]. Our measured TAGs with ≥3 DBs are composed of long-chain (carbon atoms 52-60) PUFAs, which may indicate potential anti-inflammatory properties. Although we could not determine the number and position of the DBs, it is possible that the anti-inflammatory effects of PUFAs may be more evident in a state of systematic inflammation (e.g., resulting from obesity) than among those with normal BMI [55]. Other differential analyses were also observed for plasmalogen and CE by tumor ER status. Although heterogeneity was not observed in our previous NHS publication [9] or not reported in other studies [11, 12, 16,17,18,19, 22, 23], our results may indicate a complex relationship underpinning lipid metabolism, estrogen, and breast cancer development. Because the underlying biological mechanisms are unclear and because we conducted many analyses within each subgroup, it is also important to note that these subgroup findings may be due to chance. Further replication of these results is needed.
Regarding individual metabolites, taurodeoxycholate was the only one to show a suggestive statistically significant association with breast cancer risk after multiple comparison correction in crude and adiposity-adjusted models. This secondary finding may warrant further consideration given prior evidence linking bile acid metabolism to cancer development and progression [56,57,58]. Taurodeoxycholate, a taurine-conjugated secondary bile acid, has been associated with metabolic and inflammatory pathways that may influence breast carcinogenesis [57, 58]. While our findings should be interpreted with caution due to the lack of robustness after full adjustment, the observed direction of association may suggest a potential biological role that merits investigation in future studies.
Some inconsistencies between our study and other publications may be attributed to multiple factors. First, our study population was predominantly premenopausal at the time of blood collection, while others were mainly composed of postmenopausal women. It is possible that the different findings between lipid metabolites (especially TAG, PC, and plasmalogen) and breast cancer incidence in premenopausal vs. postmenopausal women potentially indicate that these metabolites play distinct metabolic roles at different stages in a woman’s life [45]. Second, it is not straightforward to compare metabolomic profiling platforms across studies, and the associations may depend on platform capacity and, for some metabolites, lipid chains. For example, our measured TAG, PC, and plasmalogen are all long chains with regard to carbon atoms (range 30–60), while others reported significant findings on some short- or medium-chain metabolites (carbon atom <30) [11, 19, 22]. Moreover, the specific timing of blood collection, fasting blood status, and the time from blood draw to diagnosis also differed across studies, though key results remained similar in our sensitivity analyses that investigated variations in associations by these factors.
This study represents a large, prospective investigation of associations between metabolomic profiles and breast cancer incidence among a mixed population that includes 73% premenopausal women. We had relatively broad coverage of multiple metabolite classes, and we were able to assess the extent to which associations were independent of multiple key breast cancer risk factors and metabolic factors. There are also limitations of this study. First, while the platform includes some coverage of amino acids, nucleotides, and other small molecules, it is primarily enriched for lipid-related metabolites; we used a semi-targeted approach, which is fewer than others who have used an untargeted approach (e.g., CPSIII metabolites n > 800); therefore, our platform may not provide comprehensive coverage of metabolite classes, e.g., the plasma lipidome. Second, these metabolites were measured at only one point in time. However, overall, the identified metabolites in NHS were relatively stable over time (ICCs or correlation over 1–2 years ≥0.59) [28]. Third, we could not identify specific fatty acid compositions and methyl end locations to subclass PUFA types. Fourth, we could not directly assess the extent to which associations were independent of circulating levels of total cholesterol, HDL cholesterol, LDL cholesterol, and fasting glucose. However, results were similar after adjusting for hyperlipidemia status. Fifth, we aimed to estimate associations between metabolites and breast cancer incidence (rather than the causal effects of well-defined exposures), and our results should be interpreted accordingly [59]. Lastly, an independent validation dataset is lacking here. Further population studies focusing on premenopausal women are needed to validate our findings.
In sum, in this prospective study of metabolomic profiles and breast cancer risk, we found that several lipid metabolites from various subclasses (TAGs with <3 DBs, PCs, and plasmalogens) were associated with breast cancer incidence. Variations of associations with other metabolite classes were observed by adiposity and tumor ER status. Further validation of our findings in independent cohorts of premenopausal women is warranted.
Data availability
Due to participant privacy and data use agreements, our study data are not publicly available. Investigators interested in accessing NHSII data may submit a research proposal to the data access committee via the Nurses’ Health Study website (https://www.nurseshealthstudy.org/researchers). All proposals are subject to review and approval by the NHS steering committee to ensure consistency with participant consent and study policies.
Code availability
The final analytic programs used to generate derived data and study results are systematically documented and archived within the NHS secure intranet environment to ensure reproducibility. These programs are accessible to both internal and external researchers via the cohort intranet, contingent upon approved user login and data use agreements.
References
Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71:209–49.
Renehan AG, Zwahlen M, Egger M. Adiposity and cancer risk: new mechanistic insights from epidemiology. Nat Rev Cancer. 2015;15:484–98.
Rubinstein MM, Brown KA, Iyengar NM. Targeting obesity-related dysfunction in hormonally driven cancers. Br J Cancer. 2021;125:495–509.
Moore SC. Metabolomics and breast cancer: scaling up for robust results. BMC Med. 2020;18:18.
Zeleznik OA, Balasubramanian R, Zhao Y, Frueh L, Jeanfavre S, Avila-Pacheco J, et al. Circulating amino acids and amino acid-related metabolites and risk of breast cancer among predominantly premenopausal women. NPJ Breast Cancer. 2021;7:54.
Giaquinto AN, Sung H, Miller KD, Kramer JL, Newman LA, Minihan A, et al. Breast cancer statistics, 2022. CA Cancer J Clin. 2022;72:524–41.
World Cancer Research Fund/American Institute for Cancer Research. Continuous Update Project Expert Report. Diet, nutrition, physical activity, and breast cancer. 2018. https://www.wcrf.org/research-policy/global-cancer-update-programme/.
Schmidt DR, Patel R, Kirsch DG, Lewis CA, Vander Heiden MG, Locasale JW. Metabolomics in cancer research and emerging applications in clinical oncology. CA Cancer J Clin. 2021;71:333–58.
Brantley KD, Zeleznik OA, Rosner B, Tamimi RM, Avila-Pacheco J, Clish CB, et al. Plasma metabolomics and breast cancer risk over 20 years of follow-up among postmenopausal women in the Nurses’ Health Study. Cancer Epidemiol Biomark Prev. 2022;31:839–50.
Zeleznik OA, Balasubramanian R, Ren Y, Tobias DK, Rosner BA, Peng C, et al. Branched-chain amino acids and risk of breast cancer. JNCI Cancer Spectr. 2021;5:pkab059.
Moore SC, Mazzilli KM, Sampson JN, Matthews CE, Carter BD, Playdon MC, et al. A metabolomics analysis of postmenopausal breast cancer risk in the cancer prevention study II. Metabolites. 2021;11:95.
Jobard E, Dossus L, Baglietto L, Fornili M, Lécuyer L, Mancini FR, et al. Investigation of circulating metabolites associated with breast cancer risk by untargeted metabolomics: a case–control study nested within the French E3N cohort. Br J Cancer. 2021;124:1734–43.
His M, Viallon V, Dossus L, Schmidt JA, Travis RC, Gunter MJ, et al. Lifestyle correlates of eight breast cancer-related metabolites: a cross-sectional study within the EPIC cohort. BMC Med. 2021;19:312.
Tobias DK, Hazra A, Lawler PR, Chandler PD, Chasman DI, Buring JE, et al. Circulating branched-chain amino acids and long-term risk of obesity-related cancers in women. Sci Rep. 2020;10:16534.
Lécuyer L, Dalle C, Lefevre-Arbogast S, Micheau P, Lyan B, Rossary A, et al. Diet-related metabolomic signature of long-term breast cancer risk using penalized regression: an exploratory study in the SU.VI.MAX cohort. Cancer Epidemiol Biomark Prev. 2020;29:396–405.
His M, Viallon V, Dossus L, Gicquiau A, Achaintre D, Scalbert A, et al. Prospective analysis of circulating metabolites and breast cancer in EPIC. BMC Med. 2019;17:178.
Yoo HJ, Kim M, Kim M, Kang M, Jung KJ, Hwang SM, et al. Analysis of metabolites and metabolic pathways in breast cancer in a Korean prospective cohort: the Korean Cancer Prevention Study-II. Metabolomics. 2018;14:85.
Moore SC, Playdon MC, Sampson JN, Hoover RN, Trabert B, Matthews CE, et al. A metabolomics analysis of body mass index and postmenopausal breast cancer risk. J Natl Cancer Inst. 2018;110:588–97.
Lécuyer L, Victor Bala A, Deschasaux M, Bouchemal N, Nawfal Triba M, Vasson MP, et al. NMR metabolomic signatures reveal predictive plasma metabolites associated with long-term risk of developing breast cancer. Int J Epidemiol. 2018;47:484–94.
Playdon MC, Ziegler RG, Sampson JN, Stolzenberg-Solomon R, Thompson HJ, Irwin ML, et al. Nutritional metabolomics and breast cancer risk in a prospective study. Am J Clin Nutr. 2017;106:637–49.
His M, Gunter MJ, Keski-Rahkonen P, Rinaldi S. Application of metabolomics to epidemiologic studies of breast cancer: new perspectives for etiology and prevention. J Clin Oncol. 2024;42:103–15.
Stevens VL, Carter BD, Jacobs EJ, McCullough ML, Teras LR, Wang Y. A prospective case-cohort analysis of plasma metabolites and breast cancer risk. Breast Cancer Res. 2023;25:5.
Kühn T, Floegel A, Sookthai D, Johnson T, Rolle-Kampczyk U, Otto W, et al. Higher plasma levels of lysophosphatidylcholine 18:0 are related to a lower risk of common cancers in a prospective metabolomics study. BMC Med. 2016;14:13.
Rockhill B, Willett WC, Hunter DJ, Manson JE, Hankinson SE, Spiegelman D, et al. Physical activity and breast cancer risk in a cohort of young women. J Natl Cancer Inst. 1998;90:1555–1160.
Tworoger SS, Sluss P, Hankinson SE. Association between plasma prolactin concentrations and risk of breast cancer among predominately premenopausal women. Cancer Res. 2006;66:2476–82.
Bao Y, Bertoia ML, Lenart EB, Stampfer MJ, Willett WC, Speizer FE, et al. Origin, methods, and evolution of the three Nurses’ Health Studies. Am J Public Health. 2016;106:1573–81.
Paynter NP, Balasubramanian R, Giulianini F, Wang DD, Tinker LF, Gopal S, et al. Metabolic Predictors of Incident Coronary Heart Disease in Women. Circulation. 2018;137:841–53.
Townsend MK, Clish CB, Kraft P, Wu C, Souza AL, Deik AA, et al. Reproducibility of metabolomic profiles among men and women in 2 large cohort studies. Clin Chem. 2013;59:1657–67.
O’Sullivan JF, Morningstar JE, Yang Q, Zheng B, Gao Y, Jeanfavre S, et al. Dimethylguanidino valeric acid is a marker of liver fat and predicts diabetes. J Clin Investig. 2017;127:4394–402.
Wang T, Farvid MS, Kang JH, Holmes MD, Rosner BA, Tamimi RM, et al. Diabetes risk reduction diet and survival after breast cancer diagnosis. Cancer Res. 2021;81:4155–62.
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA. 2005;102:15545–50.
Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinforma. 2008;9:559.
Li J, Ji L. Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity. 2005;95:221–7.
Storey JD. The positive false discovery rate: a Bayesian interpretation and the q-value. Ann Stat. 2003;31:2013–35.
Snaebjornsson MT, Janaki-Raman S, Schulze A. Greasing the wheels of the cancer machine: the role of lipid metabolism in cancer. Cell Metab. 2020;31:62–76.
Nouri M, Mohsenpour MA, Katsiki N, Ghobadi S, Jafari A, Faghih S, et al. Effect of serum lipid profile on the risk of breast cancer: systematic review and meta-analysis of 1,628,871 women. J Clin Med. 2022;11:4503.
Rhee EP, Cheng S, Larson MG, Walford GA, Lewis GD, McCabe E, et al. Lipid profiling identifies a triacylglycerol signature of insulin resistance and improves diabetes prediction in humans. J Clin Invest. 2011;121:1402–11.
Morze J, Wittenbecher C, Schwingshackl L, Danielewicz A, Rynkiewicz A, Hu FB, et al. Metabolomics and type 2 diabetes risk: an updated systematic review and meta-analysis of prospective cohort studies. Diabetes Care. 2022;45:1013–24.
Bagheri M, Willett W, Townsend MK, Kraft P, Ivey KL, Rimm EB, et al. A lipid-related metabolomic pattern of diet quality. Am J Clin Nutr. 2020;112:1613–30.
Wang F, Chandler PD, Zeleznik OA, Wu K, Wu Y, Yin K, et al. Plasma metabolite profiles of red meat, poultry, and fish consumption, and their associations with colorectal cancer risk. Nutrients. 2022;14:978.
Razquin C, Toledo E, Clish CB, Ruiz-Canela M, Dennis C, Corella D, et al. Plasma Lipidomic Profiling and Risk of Type 2 Diabetes in the PREDIMED Trial. Diabetes Care. 2018;41:2617–24.
Hsu IR, Kim SP, Kabir M, Bergman RN. Metabolic syndrome, hyperinsulinemia, and cancer. Am J Clin Nutr. 2007;86:867S–71S.
Breeur M, Ferrari P, Dossus L, Jenab M, Johansson M, Rinaldi S, et al. Pan-cancer analysis of pre-diagnostic blood metabolite concentrations in the European Prospective Investigation into Cancer and Nutrition. BMC Med. 2022;20:351.
Küllenberg D, Taylor LA, Schneider M, Massing U. Health effects of dietary phospholipids. Lipids Health Dis. 2012;11:3.
Brantley KD, Zeleznik OA, Dickerman BA, Balasubramanian R, Clish CB, Avila-Pacheco J, et al. A metabolomic analysis of adiposity measures and pre- and postmenopausal breast cancer risk in the Nurses’ Health Studies. Br J Cancer. 2022;127:1076–85.
Ding M, Zeleznik OA, Guasch-Ferre M, Hu J, Lasky-Su J, Lee IM, et al. Metabolome-wide association study of the relationship between habitual physical activity and plasma metabolite levels. Am J Epidemiol. 2019;188:1932–43.
Peng C, Zeleznik OA, Shutta KH, Rosner BA, Kraft P, Clish CB, et al. A metabolomics analysis of circulating carotenoids and breast cancer risk. Cancer Epidemiol Biomark Prev. 2022;31:85–96.
Hang D, Zeleznik OA, He X, Guasch-Ferre M, Jiang X, Li J, et al. Metabolomic signatures of long-term coffee consumption and risk of type 2 diabetes in women. Diabetes Care. 2020;43:2588–96.
Lee DH, Jin Q, Shi N, Wang F, Bever AM, Liang L, et al. The metabolic potential of inflammatory and insulinaemic dietary patterns and risk of type 2 diabetes. Diabetologia. 2024;67:88–101.
Wang F, Glenn AJ, Tessier AJ, Mei Z, Haslam DE, Guasch-Ferré M, et al. Integration of epidemiological and blood biomarker analysis links haem iron intake to increased type 2 diabetes risk. Nat Metab. 2024;6:1807–18.
Braverman NE, Moser AB. Functions of plasmalogen lipids in health and disease. Biochim Biophys Acta. 2012;1822:1442–52.
Bozelli JC Jr., Azher S, Epand RM. Plasmalogens and chronic inflammatory diseases. Front Physiol. 2021;12:730829.
Pavlova NN, Thompson CB. The emerging hallmarks of cancer metabolism. Cell Metab. 2016;23:27–47.
Ivey KL, Rimm EB, Kraft P, Clish CB, Cassidy A, Hodgson J, et al. Identifying the metabolomic fingerprint of high and low flavonoid consumers. J Nutr Sci. 2017;6:e34.
Hirko KA, Chai B, Spiegelman D, Campos H, Farvid MS, Hankinson SE, et al. Erythrocyte membrane fatty acids and breast cancer risk: a prospective analysis in the nurses’ health study II. Int J Cancer. 2018;142:1116–29.
Fleishman JS, Kumar S. Bile acid metabolism and signaling in health and disease: molecular mechanisms and therapeutic targets. Sig Transduct Target Ther. 2024;9:97.
Mohanty I, Allaband C, Mannochio-Russo H, El Abiead Y, Hagey LR, Knight R, et al. The changing metabolic landscape of bile acids - keys to metabolism and immune regulation. Nat Rev Gastroenterol Hepatol. 2024;21:493–516.
Režen T, Rozman D, Kovács T, Kovács P, Sipos A, Bai P, et al. The role of bile acids in carcinogenesis. Cell Mol Life Sci. 2022;79:243.
Hernán MA, Hsu J, Healy B. A second chance to get causal inference right: a classification of data science tasks. Chance. 2019;32:42–9.
Acknowledgements
The authors would like to acknowledge the contribution to this study from central cancer registries supported through the Centers for Disease Control and Prevention’s National Program of Cancer Registries (NPCR) and/or the National Cancer Institute’s Surveillance, Epidemiology, and End Results (SEER) Program. Central registries may also be supported by state agencies, universities, and cancer centers. Participating central cancer registries include the following: Alabama, Alaska, Arizona, Arkansas, California, Colorado, Connecticut, Delaware, Florida, Georgia, Hawaii, Idaho, Indiana, Iowa, Kentucky, Louisiana, Massachusetts, Maine, Maryland, Michigan, Mississippi, Montana, Nebraska, Nevada, New Hampshire, New Jersey, New Mexico, New York, North Carolina, North Dakota, Ohio, Oklahoma, Oregon, Pennsylvania, Puerto Rico, Rhode Island, Seattle SEER Registry, South Carolina, Tennessee, Texas, Utah, Virginia, West Virginia, Wyoming.
Funding
This work is supported in part by grants from the National Institutes of Health (U01 CA176726, R01 CA050385, T32 CA009001, K99 CA267557, and R00 CA267557) and by funding from the Eric and Wendy Schmidt Center at the Broad Institute of MIT and Harvard. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Author information
Authors and Affiliations
Contributions
TW, AHE, WCW, JAP, CBC, and BAR contributed to study conception and design, with development of metabolomics analytical pipeline and statistical analysis led by OAZ. TW carried out primary data analysis. AHE supervised the project. TW wrote the first draft of the manuscript. TW, OAZ, EEM, KDB, RB, BAR, WCW, JAP, CBC, and AHE contributed to critical revision and editing of the manuscript and have approved the final version. TW and AHE were responsible for the decision to submit the manuscript. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wang, T., Zeleznik, O.A., McGee, E.E. et al. Prospective study of circulating metabolomic profiles and breast cancer incidence among predominantly premenopausal women. Br J Cancer (2025). https://doi.org/10.1038/s41416-025-03159-2
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41416-025-03159-2