Investigating the relationship between breast cancer risk factors and an AI-generated mammographic texture feature in the Nurses’ Health Study II

Wu, Xueyao; Jiang, Shu; Ge, Aaron; Turman, Constance; Colditz, Graham; Tamimi, Rulla M.; Kraft, Peter

doi:10.1038/s41523-025-00870-4

Download PDF

Article
Open access
Published: 23 December 2025

Investigating the relationship between breast cancer risk factors and an AI-generated mammographic texture feature in the Nurses’ Health Study II

Xueyao Wu¹,
Shu Jiang²,
Aaron Ge³,
Constance Turman⁴,
Graham Colditz²,
Rulla M. Tamimi^5,6 &
…
Peter Kraft¹

npj Breast Cancer volume 12, Article number: 5 (2026) Cite this article

2904 Accesses
Metrics details

Subjects

Abstract

The mammogram risk score (MRS), an AI-driven mammographic texture feature, strongly predicts breast cancer risk independently of breast density, though underlying mechanisms remain unclear. Using data from the Nurses’ Health Study II (292 cases, 561 controls), we validated MRS’s association with breast cancer and evaluated its relationships with established breast cancer risk factors through observational analyses, polygenic score analyses, and Mendelian randomization. MRS was significantly associated with breast cancer risk before (OR=1.92 per SD increase; 95% CI:1.57 to 2.35; 10-year AUC=0.69) and after adjustment for predicted BI-RADS density (OR=1.85; 95% CI:1.49 to 2.30). Early life body size and adult body mass index (BMI) were inversely associated with MRS, while benign breast disease history and predicted BI-RADS density showed positive associations; after adjusting for density, associations between MRS and the other three risk factors were attenuated. Polygenic score analyses and Mendelian randomization consistently demonstrated significant positive associations between genetic predictors of breast density measures (dense area, percent density, predicted BI-RADS density) and MRS. After adjusting for predicted BI-RADS density and BMI, genetic predictors of higher waist-to-hip ratio were significantly associated with increased MRS. Our findings reveal robust associations between breast density measures and MRS and suggest a potential impact of central obesity on MRS. Future larger-scale validation studies are needed.

Evaluating mammographic density′s contribution to improve a breast cancer risk model with questionnaire-based and polygenic factors

Article Open access 01 October 2025

Breast density in MRI: an AI-based quantification and relationship to assessment in mammography

Article Open access 27 October 2025

Validation of a new fully automated software for 2D digital mammographic breast density evaluation in predicting breast cancer risk

Article Open access 06 October 2021

Introduction

Breast cancer remains the most prevalent malignant cancer among women worldwide¹. While advances in mammographic screening have facilitated early detection and risk stratification by assessing breast density², traditional measures of mammographic density primarily evaluate the relative amounts of fibroglandular tissue (i.e., the functional breast tissue composed of epithelial and stromal cells). This approach limits our ability to fully capture the heterogeneity of individual breast tissue features, such as architecture and spatial relations³. Recent research has leveraged accumulating digital mammogram datasets coupled with sophisticated computational techniques to quantify texture features of mammograms, aiming for more precise and individualized risk predictions. These texture features capture detailed patterns and variations in breast tissue that go beyond simple density measurements⁴. A notable development in this area is the mammogram risk score (MRS), an innovative, artificial intelligence (AI)-driven texture feature derived from whole mammogram images that robustly predicts breast cancer risk independently of breast density (5-year area under the receiver operating characteristic curve [AUC] = 0.75)^5,6. However, given that the MRS is derived from a deep learning model that lacks inherent interpretability, its biological underpinnings remain unclear.

The risk of breast cancer is influenced by multiple factors beyond age and genetic markers. Lifestyle, behavioral, and developmental factors, such as anthropometric measures and reproductive events, collectively contribute to breast cancer susceptibility⁷ and may also relate to features in breast tissue. Epidemiological studies have highlighted significant associations between traditional measures of mammographic density and various risk factors, including early life and adult adiposity^8,9,10,11, height^12,13, age at menarche^12,13, age at first birth^14,15, age at natural menopause¹⁶, and other reproductive/hormonal factors¹⁷. Utilizing germline genetic variants as instrumental variables (IVs) to strengthen causal inference, Mendelian randomization (MR) studies have reinforced associations with early life and adult adiposity^18,19, offering protection against confounding and reverse causation typical in observational studies²⁰.

Given that texture features capture distinct aspects of breast tissue from summary density measures, investigating how established risk factors relate to these features could improve our understanding of their underlying biology and provide valuable insights into breast cancer pathogenesis. Previous studies have demonstrated phenotypic and genetic relationships between adiposity and V^21,22, a texture feature reflecting grayscale intensity variations on digitized film mammograms²³. MRS, by comparison, was developed using supervised machine learning to not only predict variation in whole digital images more accurately but also to capture biological features relevant to breast cancer risk⁶. These characteristics make MRS a promising target for investigation aimed at advancing breast cancer prevention. Yet, to date, no observational or MR study has explored these associations for the MRS.

With an overarching goal of deepening the understanding of the biological underpinnings of MRS and its potential role in breast cancer susceptibility, the present study comprehensively investigates the relationships between established breast cancer risk factors—encompassing anthropometrics, reproductive and hormonal factors, family history, and traditional mammographic density metrics—and MRS, through comprehensive observational and genetic analyses performed within the Nurses’ Health Study II (NHS II).

Results

Participant characteristics and MRS-breast cancer association in the NHS II

This nested case-control study comprised 853 women (292 cases and 561 controls) with a mean age of 55.3 (±5.45 years) at the time of the mammogram. The majority (64.1%) were postmenopausal. A comparison of risk factor distribution between groups divided by median MRS can be found in Table 1. Notably, compared to those below the median, participants with above-median MRS were younger, more likely to have lower body mass index (BMI) and waist-to-hip ratio (WHR), denser breasts, a history of benign breast disease at the time of mammogram, and more likely to be breast cancer cases (47.8% vs 20.7%) (all P < 0.05).

Table 1 Baseline characteristics of participants according to median mammogram risk score

Full size table

External validation in NHS II, which included 201 cases and 561 controls after excluding cases diagnosed within 6 months of mammography, demonstrated a strong association between MRS and breast cancer risk (odds ratio [OR] = 1.92 per standard deviation [SD] difference in MRS; 95% confidence intervals [CI]: 1.57 to 2.35; P = 1.98 × 10⁻¹⁸; 10-year AUC = 0.69) (Supplementary Table 1, Supplementary Figs. 1 and 2). Cases were diagnosed 0.5–10.1 years (median 2.6) after the mammogram used for MRS calculation. The association remained robust after adjusting for predicted Breast Imaging Reporting and Data System (BI-RADS) density (OR = 1.85; 95% CI: 1.49 to 2.30; P = 2.50 × 10⁻⁸) (Supplementary Table 1).

Associations between observed breast cancer risk factors and MRS

Both predicted BI-RADS density (β = 0.31 SD difference in MRS per SD difference in predicted BI-RADS density; 95% CI: 0.25 to 0.38; P = 1.94 × 10⁻²⁰) and history of benign breast disease (β = 0.23 SD difference in MRS with vs. without history; 95% CI: 0.10 to 0.36; P = 4.50 × 10⁻⁴) showed positive associations with MRS. Early life body size (β = −0.08 SD difference in MRS per SD difference in body size; 95% CI: −0.14 to −0.02; P = 9.59 × 10⁻³) and adult BMI (β = −0.08 SD difference in MRS per SD difference in BMI; 95% CI: −0.14 to −0.02; P = 1.11 × 10⁻²) demonstrated negative associations. No statistically significant associations were observed for the other examined risk factors (all P > 0.05, Table 2). These results remained consistent in both direction and statistical significance when restricted to controls only or additionally adjusted for menopausal status (Supplementary Tables 2 and 3). When adjusted for predicted BI-RADS density, associations with history of benign breast disease (β = 0.11), early life body size (β = −0.02), and adult BMI (β = 0.05) were all attenuated towards the null (Table 2, Supplementary Table 2).

Table 2 Linear regression of mammogram risk score on each breast cancer risk factor

Full size table

Associations between polygenic scores for breast cancer risk factors and MRS

Linear regressions of MRS on polygenic score (PGS) for risk factors revealed significant positive associations for dense area (β = 0.16 SD difference in MRS per SD difference in PGS; 95% CI: 0.06 to 0.25; P = 1.37 × 10⁻³) and percent density (β = 0.14 SD difference in MRS per SD difference in PGS; 95% CI: 0.05 to 0.23; P = 3.29 × 10⁻³). No significant associations were observed between the PGS for other risk factors and MRS (Table 3). A similar pattern of associations was observed in analyses restricted to controls and in models adjusted for menopausal status (Supplementary Tables 4 and 5). After adjusting for predicted BI-RADS density, the association for percent density remained strong, whereas the association for dense area weakened slightly. A significant association was additionally observed between higher PGS for WHR adjusted for BMI (WHRadjBMI) and increased MRS (β = 0.12 SD difference in MRS per SD difference in PGS; 95% CI: 0.03 to 0.21; P = 1.18 × 10⁻²) (Table 3, Supplementary Table 4).

Table 3 Linear regression of mammogram risk score on polygenic score for each breast cancer risk factor

Full size table

Mendelian randomization between breast cancer risk factors and MRS

Linear regressions of risk factors on their corresponding PGS revealed significant genetic associations for 7 risk factors, including height, BMI, age at menarche, early life body size, WHR, predicted BI-RADS density, and age at natural menopause (F-statistics: 212.56 to 5.36, Supplementary Table 6). While PGS for dense area was significantly associated with predicted BI-RADS density (R² = 0.03, F = 9.09, P = 2.76 × 10⁻³), PGS for percent density showed no association (R² = 0.00, F = 1.47, P = 0.23). The dense area PGS was thus used as an IV for predicted BI-RADS density in subsequent two-stage least squares (2SLS) analyses. WHRadjBMI, age at first birth, and number of children ever born were excluded from 2SLS due to weak instrument strength (F-statistic < 5). Associations adjusted for menopausal status or predicted BI-RADS density are detailed in Supplementary Tables 7 and 8.

2SLS analyses found a significant association between genetically predicted BI-RADS density and MRS (β = 0.84 SD difference in MRS per SD difference in predicted BI-RADS density; 95% CI: 0.21 to 1.46; P = 8.85 × 10⁻³), while identifying no statistically significant associations between the other 6 genetically predicted risk factors and MRS. Among the other risk factors, the strongest effect estimates were observed for age at natural menopause (β = −0.80 SD difference in MRS per SD difference in genetically predicted age at natural menopause, 95% CI: −2.58 to 0.98, P = 0.38) and early life body size (β = −0.13 SD difference in MRS per SD difference in genetically predicted early life body size, 95% CI: −0.59 to 0.33, P = 0.57), neither of which reached statistical significance (Table 4). All additional adjustments yielded similar results (Table 4, Supplementary Tables 9 and 10).

Table 4 Two-stage least squares regression between each breast cancer risk factor (exposure) and mammogram risk score (outcome)

Full size table

Two-sample MR analyses identified significant associations between genetically predicted dense area and MRS (β = 0.83 SD difference in MRS per SD difference in dense area; 95% CI: 0.39 to 1.27; P = 2.09 × 10⁻⁴), and genetically predicted percent density and MRS (β = 1.14 SD difference in MRS per SD difference in percent density; 95% CI: 0.55 to 1.74; P = 1.61 × 10⁻⁴). No evidence supported significant causal associations with other risk factors (Fig. 1). Sensitivity analyses using MR-Egger regression, weighted median, weighted mode, and inverse-variance weighted (IVW) excluding outlier SNPs yielded consistent results (Supplementary Table 11). MR-Clust analysis found all variants for dense area and percent density clustered into a single group with similar causal effects, suggesting no evidence of heterogeneous causal mechanisms; for other risk factors, no variants showed significant effects (Supplementary Fig. 3). Chi-square tests on Wald ratios across all IVs for each risk factor revealed no statistically significant associations between risk factor-associated genetic variants and MRS (all P > 0.05, Supplementary Table 12).

**Fig. 1: Two-sample Mendelian randomization analysis examining associations between genetically predicted risk factors (exposures) and mammogram risk score (outcome).**

The patterns of associations were robust to both restriction to control subjects (Supplementary Table 11) and adjustment for menopausal status (Supplementary Tables 13 and 14, Supplementary Fig. 4). Utilizing IV-outcome associations adjusting for predicted BI-RADS density, two-sample MR showed a significant association between genetically predicted WHRadjBMI and MRS (IVW: β = 0.51 SD difference in MRS per SD difference in WHRadjBMI; 95% CI: 0.15 to 0.87; P = 6.10 × 10⁻³) (Fig. 1), consistent across all sensitivity analyses. Association with percent density remained substantially unchanged, while association for genetically predicted dense area was attenuated (Fig. 1, Supplementary Tables 15 and 16, Supplementary Fig. 5).

Discussion

To the best of our knowledge, this study presents one of the first and most comprehensive examinations to date of the relationships between known breast cancer risk factors and MRS—an AI-generated mammographic texture feature derived from full-field digital mammograms. Our analyses revealed robust phenotypic and genetic associations between various mammographic density measures—including predicted BI-RADS density, absolute dense area, and percent density—and the MRS, as well as a suggestive association between higher WHRadjBMI and increasing MRS.

Our external validation of the MRS in the predominantly White NHS II cohort demonstrated its robust predictive capability for breast cancer risk, supporting the generalizability of the algorithm to an independent population. The MRS maintained good discriminatory power for long-term risk, achieving a 10-year AUC of 0.69, which compares favorably with the 5-year AUCs reported in the original validation cohorts (0.75 in the Joanne Knight Breast Health Cohort at Washington University [WashU cohort]; 0.74 in the Emory Breast Imaging Dataset [EMBED]; 27–46% Non-Hispanic Black women)⁶. This result underscores the stability of the MRS as a risk marker over an extended follow-up period. Notably, our mutual adjustment analyses provide insight into the relationship between MRS, breast density, and breast cancer risk. The association between MRS and breast cancer incidence remained strong after adjusting for predicted BI-RADS density; conversely, the association for predicted BI-RADS density was substantially attenuated after adjusting for MRS. This pattern suggests that while both are important risk factors, MRS may capture mammographic information that is more proximally located on the causal pathway to breast cancer than summary density measures alone. These findings collectively underscore MRS’s potential to enhance breast cancer risk stratification across diverse clinical settings and populations.

Our findings align with previously reported significant phenotypic relationships between mammographic density and breast texture features^{17,21,24,25,26,27,28}. The moderate correlation between predicted BI-RADS density and MRS (r ~ 0.31) further corroborates that while these measures are related, they are likely to reflect distinct aspects of mammographic information. Beyond the phenotypic association, our genetic analyses provide converging evidence for a shared genetic architecture and potential causal link between mammographic density and MRS. These findings corroborate and extend previous evidence from different texture measures and study designs that demonstrated similar causal relationships²⁹, enhancing the credibility of MRS as a biologically plausible risk factor for breast cancer. Future studies should aim to elucidate the specific biological processes reflected by MRS and their implications for breast cancer etiology.

Moving beyond density, investigating the effects of lifestyle, behavioral, and developmental/biological risk factors on breast tissue characteristics, as summarized in mammograms, is crucial for extracting biological insights into modifiable factors for prevention studies and understanding pathways for potential preventive drug targets. While MRS itself represents a novel feature with limited existing literature, it is important to contextualize our findings within the existing body of research on other mammographic features. For instance, previous studies have demonstrated associations between various breast cancer risk factors and mammographic density, and between risk factors and other texture features such as V^21,22. Our study of MRS builds upon previous findings by focusing on MRS—a supervised, risk-optimized score trained via a ResNet-18 convolutional neural network and validated in large, diverse cohorts to identify tissue patterns most predictive of future breast cancer incidence. This approach offers a complementary perspective for investigating the biology of risk-relevant mammographic changes.

The emergence of a statistically significant association between genetic predictors of WHRadjBMI and MRS only after adjusting for predicted BI-RADS density suggests that fat distribution, independent of overall body mass, might influence breast tissue characteristics in ways not fully captured by mammographic density alone. The ability of MRS to reveal this relationship indicates its value as an advanced imaging feature in reflecting nuanced aspects of breast tissue composition that may be relevant to cancer risk assessment. Future studies are needed to validate our results and investigate the biological mechanisms underlying the complex interplay between fat distribution, breast tissue texture features, and breast cancer susceptibility.

Several limitations of our study should be acknowledged. First, our sample size was relatively limited, which may have led to insufficient statistical power to detect associations with some risk factors, particularly those with smaller effect sizes. We emphasize that null findings observed should not be interpreted as definitive evidence of no association, and that larger studies with greater statistical power are needed. Second, our analysis was limited by the availability of only craniocaudal (CC)-view mammograms. As the MRS algorithm is optimized using four views, our reported predictive accuracy likely represents a conservative estimate of its full potential. Nevertheless, the strong performance achieved with this two-view approach underscores the algorithm’s utility in common, real-world scenarios where imaging sets may be incomplete, thus broadening its applicability in diverse data settings. Third, our analysis was based on a nested case-control design, which could potentially introduce ascertainment bias. However, we expect that this design would not substantially affect our results, given our careful adjustment for case-control status and the consistency of our findings in control-only analyses³⁰. Fourth, while we evaluated predicted BI-RADS density (which mimics qualitative visual assessments by radiologists)³¹, we were unable to adjust for quantitative density measures due to data unavailability. This limitation may have reduced our statistical power to detect associations, potentially underestimating relationships between other breast cancer risk factors and MRS. Additionally, our 2SLS analyses may have been affected by phenotype mismatches between the traits used to derive PGS from genome-wide association studies (GWAS) and the corresponding phenotypes measured in our study. For example, the use of quantitative dense area PGS to instrument qualitative BI-RADS density categories could potentially violate the restriction exclusion assumption and introduce bias in our causal estimates.

Our study has several notable strengths. A key advantage is the availability of genetic data, digital mammogram data, and comprehensive covariate data on the same set of samples, allowing for integrated analyses across multiple domains. Triangulating evidence from both observational and genetic studies mitigated biases inherent in each study design, providing a multi-perspective evaluation of associations. The MRS algorithm was developed independently of the NHS II cohort, reducing the possibility for overfitting or circular reasoning in our analyses. Developed using standard digital mammograms, sophisticated statistical methods, and large-scale populations, the MRS itself represents an advanced texture feature with significant potential in clinical settings.

To conclude, this study provides initial insights into the etiologic underpinnings of MRS. We validated that MRS serves as a robust predictor of breast cancer risk, providing information independent of and beyond that captured by traditional density measurements. Our investigation further revealed robust associations between breast density measures and MRS and suggests a potential impact of central obesity on MRS. Future research should encompass larger-scale studies to definitively characterize these associations and elucidate the underlying biological mechanisms. As our understanding of mammographic texture features advances, tools like MRS that offer a more nuanced view of risk beyond conventional measures may become integral to personalized breast cancer assessment and prevention strategies.

Methods

Study participants

The current study leverages resources from the NHS II, a large prospective cohort established in 1989 with 116,429 female and predominantly White (>90%) registered nurses aged 25–42 from 14 states³². Between 1996 and 1999, blood samples were collected from 29,611 women, forming a blood subcohort³³. Genotype data from four platforms (Affymetrix 6.0, Illumina HumanHap, Illumina OmniExpress, and Illumina OncoArray) imputed to the 1000 Genomes Phase 3 version 5 reference panel were used in this study. Pre-diagnostic screening mammograms, conducted as close as possible to the blood draw date, were collected as part of a breast cancer case-control study nested within the blood subcohort. Participants have been followed up biennially through self-administered questionnaires to update exposure information and disease diagnoses. For this study, we initially included 853 women (292 cases and 561 controls) with eligible full-field digital mammograms. Among these, 383 women (143 cases and 240 controls) had available imputed genotype data and were included in the genetic analyses. Detailed descriptions of the full genotyping and quality control pipeline³⁴, as well as the mammogram collection and processing procedure^21,24, are available in previous publications. Cohort participants provided written informed consent. The study protocol was approved by the institutional review boards of the Brigham and Women’s Hospital and Harvard T.H. Chan School of Public Health, and those of participating registries as required.

Risk factors measurement

Information on various established risk factors for breast cancer was collected for NHS II women. These factors included early life and adult body size, fat distribution, height, reproductive characteristics, and family history of breast cancer. Early life body size, WHR, height (inches), and age at menarche were reported via the baseline questionnaire in 1989. Body sizes at ages 5 and 10 years were recalled using Stunkard’s nine-level pictogram (levels 1–9: most lean to most overweight)³⁵. The average of these two measurements was used to represent early life body size. For other covariates, we used the most recent information from the biennial questionnaires preceding the date of the mammogram. These covariates included: BMI, age at first birth, menopausal status, age at natural menopause, current postmenopausal hormone use, parity (number of pregnancies ≥6 months), history of benign breast disease, and family history of breast cancer. BMI (kg/m²) was calculated by dividing weight (kg) by the square of baseline height (m). WHRadjBMI was further calculated by regressing WHR on BMI and using the residuals from this regression.

We also assessed predicted BI-RADS density using a deep learning algorithm, which was previously developed to predict mammographic breast density from digital mammograms³¹. The algorithm categorizes breasts from a (almost entirely fatty) to d (extremely dense), matching an experienced mammographer’s evaluation (weighted κ for agreement with radiologists = 0.85). We coded these categories as 1, 2, 3, 4, with higher numbers indicating denser breasts. The digital mammograms used for MRS calculation were used to assess predicted BI-RADS density. Other quantitative density measures, such as absolute dense area and percent density, were not directly measured for the NHS II participants included in this study.

Mammogram risk score measurement

The MRS is an AI-derived score capturing the texture information embedded in the whole digital mammograms, represented by millions of pixels^5,6. It was developed utilizing 220,868 mammograms from 10,126 racially diverse, initially cancer-free women in the WashU cohort³⁶, of whom 505 developed breast cancer during follow-up. Validation was performed using 150,352 mammograms from 15,885 women in EMBED, demonstrating consistently robust predictive performance (5-year AUC = 0.74)⁶. The algorithm, previously described in detail⁶, takes all standard mammogram views (CC and/or mediolateral oblique) from both breasts as input with the option of additional clinical risk factors. The outputs of the algorithm include MRS, which is a transparent weighted sum of feature coefficients, probability of 5-year breast cancer onset, and relative risk for each woman that can be used for risk calibration. For the current study, we generated an MRS for each of the 853 women by applying the algorithm to their pair of digital CC-view mammograms (one from each breast), totaling 1706 images. We used the earliest digital mammograms available for each woman.

Genetic variants and polygenic scores for risk factors

We selected the largest available GWAS conducted among women of European ancestry for early life and adult body size, WHR, WHRadjBMI, height, age at menarche, age at first birth, age at natural menopause, number of children ever born, dense area, non-dense area, and percent density. For each risk factor, we collected lists of genetic variants reported as genome-wide significant (P < 5.0 × 10⁻⁸) in the original female-specific GWAS, along with their beta coefficients. When such variants were not explicitly reported, we applied PLINK’s clumping function³⁷ (parameters: P < 5.0 × 10⁻⁸, linkage disequilibrium r² < 0.001 within a 10 Mb window) to obtain this information. For height, for which no female-specific GWAS is known to be publicly accessible, we used genetic variant information from the largest available sex-combined GWAS. This approach was justified as no statistically significant evidence for sex differences in height genetics has been reported³⁸.

To ensure these established variants were reliably imputed in the NHS II data, we included only those with matching alleles, non-ambiguous SNPs, a minimum imputation score >0.3 across all genotyping platforms, and a minor allele frequency >0.005. These selected variants were used for the PGS calculation and as IVs in causal inference analyses. Detailed information on GWAS sources and quality control of genetic variants is provided in Supplementary Table 17.

Statistical analysis

We generated descriptive statistics for all variables. Continuous variables were described using mean and SD, while categorical variables were described using frequency and percentage. We assessed differences between higher and lower MRS groups (using the median value as the cutoff) using Student’s t-test or Wilcoxon rank-sum test for continuous variables and Chi-square test for categorical variables.

We first validated the association between MRS and breast cancer in NHS II using logistic regression after excluding 91 cases diagnosed within 6 months after the mammogram used for calculation. To evaluate the association between breast cancer risk factors and MRS, four main analyses were performed: (i) linear regressions of MRS on each observed risk factor to quantify their observational association without accounting for genetic predisposition; (ii) linear regressions of MRS on the PGS associated with each risk factor to evaluate the relationship between genetic predisposition to each risk factor and MRS; (iii) MR analysis via 2SLS regressions of MRS on each genetically predicted risk factor, and (iv) two-sample MR of MRS using GWAS summary statistics of each risk factor, to evaluate potential causal associations. For all analyses, we standardized MRS and all non-binary variables for easier comparison across risk factors. Binary variables included postmenopausal hormone use, history of benign breast disease, and family history of breast cancer, each categorized as “Yes” or “No.” In two-sample MR, we retained the original scale of genetic associations from the source GWAS.

For each risk factor, we calculated its weighted PGS using PLINK’s “--score” function³⁷, summing the products of effect allele dosage and corresponding beta coefficient across all selected genetic variants for each woman. Prior to 2SLS regression, we assessed instrument strength by regressing each risk factor on its corresponding PGS, obtaining F-statistics and correlation coefficient estimates. To minimize weak instrument bias, we excluded PGS with an F-statistic < 5 or a correlation P > 0.05 from the 2SLS analysis. The 2SLS procedure involved two stages: first, regressing each risk factor on its PGS; second, using the predicted values as independent variables in a regression model with MRS as the dependent variable.

For two-sample MR, we obtained the “IVs-exposure” associations directly from the corresponding GWAS. The “IV-outcome” associations were estimated from the NHS II dataset using PLINK’s “--glm” function³⁷; for our primary analysis, this was performed in the full genetic dataset (N = 383) with adjustment for case-control status. Our primary method was the random-effect IVW approach³⁹, which assumes a zero intercept and estimates causality using random-effects meta-analysis. To validate MR model assumptions²⁰ and assess the robustness of our findings, we applied complementary methods including MR-Egger regression (which detects and accounts for directional pleiotropy)⁴⁰, weighted median (robust to up to 50% invalid instruments)⁴¹, weighted mode (identifies the causal effect estimate that is most consistent across all variants)⁴², and IVW excluding outlier SNPs detected using Radial MR’s iterative Cochran’s Q method⁴³. We considered a causal association significant if it reached statistical significance in the IVW analysis and maintained a consistent direction across all sensitivity analyses. Following two-sample MR, we performed two additional analyses: an MR-Clust analysis to cluster genetic variants with similar causal estimates, which may reflect heterogeneous causal mechanisms⁴⁴, and a Chi-square test on Wald ratios estimated in two-sample MR across all IVs for each risk factor to test if any of the risk factor-associated genetic variants associate with MRS.

To mitigate confounding, we employed three adjustment sets across all analyses. The crude model included age at mammogram and, where appropriate, genotyping platform and the top 10 genetic principal components. The second and third sets were additionally adjusted for menopausal status and predicted BI-RADS density, respectively. To address potential ascertainment bias arising from investigating MRS in a case-control study design that implicitly conditions on breast cancer status, we conducted all analyses using two approaches as previously recommended³⁰: (1) including case-control status as an additional covariate (our primary approach to maintain sample size), and (2) restricting analyses to controls only. 2SLS and TSMR analyses were conducted using packages “ivreg”, “TwoSampleMR”, and “RadialMR” in R (v4.1.0). We used the conventional P-value threshold of 0.05 to define statistical significance, given the relatively limited sample size and the exploratory nature of our study.

Data availability

The data that support the findings of this study are available from the Nurses’ Health Studies; however, they are not publicly available. Investigators interested in using the data can request access, and feasibility will be discussed at an investigator’s meeting. Limits are not placed on scientific questions or methods, and there is no requirement for co-authorship. Additional data sharing information and policy details can be accessed at http://www.nurseshealthstudy.org/researchers. All GWAS summary statistics used in this study are publicly available.

Code availability

Analysis scripts used to generate the results of this study are available from the corresponding author upon reasonable request.

References

Bray, F. et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 74, 229–263 (2024).
PubMed Google Scholar
Vilmun, B. M. et al. Impact of adding breast density to breast cancer risk models: a systematic review. Eur. J. Radiol. 127, https://doi.org/10.1016/J.EJRAD.2020.109019 (2020).
Gastounioti, A., Conant, E. F. & Kontos, D. Beyond breast density: a review on the advancing role of parenchymal texture analysis in breast cancer risk assessment. Breast Cancer Res. 18, https://doi.org/10.1186/S13058-016-0755-8 (2016).
Anandarajah, A. et al. Studies of parenchymal texture added to mammographic breast density and risk of breast cancer: a systematic review of the methods used in the literature. Breast Cancer Res. 24, https://doi.org/10.1186/S13058-022-01600-5 (2022).
Jiang, S., Bennett, D. L. & Colditz, G. A. Deriving a Mammogram-Based Risk Score from Screening Digital Breast Tomosynthesis for 5-YearBreast Cancer Risk Prediction. Cancer Prev Res (Phila). 18, 347–354 (2025).
Article PubMed PubMed Central Google Scholar
Jiang, S., Bennett, D., Rosner, B., Tamimi, R. & Colditz, G. Development and validation of a 5-year dynamic risk model using repeated mammograms. JCO Clin. Cancer Inform. 8, e2400200 (2024).
Loibl, S., Poortmans, P., Morrow, M., Denkert, C. & Curigliano, G. Breast cancer. Lancet 397(10286), 1750–1769 (2021).
Article PubMed Google Scholar
Hopper, J. L. et al. Childhood body mass index and adult mammographic density measures that predict breast cancer risk. Breast Cancer Res. Treat. 156, 163–170 (2016).
Article PubMed Google Scholar
Andersen, Z. J. et al. Birth weight, childhood body mass index, and height in relation to mammographic density and breast cancer: a register-based cohort study. Breast Cancer Res. 16, https://doi.org/10.1186/BCR3596 (2014).
Barnard, M. E. et al. Body mass index and mammographic density in a multiracial and multiethnic population-based study. Cancer Epidemiol. Biomark. Prev. 31, 1313–1323 (2022).
Article Google Scholar
Boyd, N. F. et al. Body size, mammographic density, and breast cancer risk. Cancer Epidemiol. Biomark. Prev. 15, 2086–2092 (2006).
Article Google Scholar
Dite, G. S. et al. Predictors of mammographic density: insights gained from a novel regression analysis of a twin study. Cancer Epidemiol. Biomark. Prev. 17, 3474–3481 (2008).
Article Google Scholar
Ward, S. V. et al. The association of age at menarche and adult height with mammographic density in the International Consortium of Mammographic Density. Breast Cancer Res. 24, https://doi.org/10.1186/S13058-022-01545-9 (2022).
El-Bastawissi, A. Y., White, E., Mandelson, M. T. & Taplin, S. H. Reproductive and hormonal factors associated with mammographic breast density by age (United States). Cancer Causes Control 11, 955–963 (2000).
PubMed Google Scholar
Yaghjyan, L., Colditz, G. A., Rosner, B., Bertrand, K. A. & Tamimi, R. M. Reproductive factors related to childbearing and mammographic breast density. Breast Cancer Res. Treat. 158, 351–359 (2016).
Article PubMed PubMed Central Google Scholar
Rice, M. S. et al. Reproductive and lifestyle risk factors and mammographic density in Mexican women. Ann. Epidemiol. 25, 868–873 (2015).
Article PubMed PubMed Central Google Scholar
Yaghjyan, L. et al. Relationship between breast cancer risk factors and mammographic breast density in the Fernald Community Cohort. Br. J. Cancer 106, 996–1003 (2012).
Article PubMed PubMed Central Google Scholar
Haas, C. B. et al. Disentangling the relationships of body mass index and circulating sex hormone concentrations in mammographic density using Mendelian randomization. Breast Cancer Res. Treat. 206, 295–305 (2024).
Article PubMed Google Scholar
Vabistsevits, M. et al. Mammographic density mediates the protective effect of early-life body size on breast cancer risk. Nat. Commun. 15, 1–15 (2024).
Article Google Scholar
Smith, G. D. & Ebrahim, S. Mendelian randomization”: can genetic epidemiology contribute to understanding environmental determinants of disease?. Int. J. Epidemiol. 32, 1–22 (2003).
Article PubMed Google Scholar
Oh, H. et al. Early-life and adult anthropometrics in relation to mammographic image intensity variation in the nurses’ health studies. Cancer Epidemiol. Biomark. Prev. 29, 343–351 (2020).
Article Google Scholar
Liu, Y. et al. A genome-wide association study of mammographic texture variation. Breast Cancer Res. 24, 1–15 (2022).
Article PubMed PubMed Central Google Scholar
Heine, J. J. et al. A novel automated mammographic density measure and breast cancer risk. J. Natl. Cancer Inst. 104, 1028–1037 (2012).
Article PubMed PubMed Central Google Scholar
Warner, E. T. et al. Automated percent mammographic density, mammographic texture variation, and risk of breast cancer: a nested case-control study. npj Breast Cancer 7, https://doi.org/10.1038/S41523-021-00272-2 (2021).
Manduca, A. et al. Texture features from mammographic images and risk of breast cancer. Cancer Epidemiol. Biomark. Prev. 18, 837 (2009).
Article Google Scholar
Wanders, J. O. P. et al. The combined effect of mammographic texture and density on breast cancer risk: a cohort study. Breast Cancer Res. 20, https://doi.org/10.1186/S13058-018-0961-7 (2018).
Watt, G. P. et al. Mammographic texture features associated with contralateral breast cancer in the WECARE Study. npj Breast Cancer 7, 1–4 (2021).
Article Google Scholar
Malkov, S. et al. Mammographic texture and risk of breast cancer by tumor type and estrogen receptor status. Breast Cancer Res. 18, 1–11 (2016).
Article Google Scholar
Ye, Z. et al. Causal relationships between breast cancer risk factors based on mammographic features. Breast Cancer Res. 25, 127 (2023).
Article PubMed PubMed Central Google Scholar
Monsees, G. M., Tamimi, R. M. & Kraft, P. Genome-wide association scans for secondary traits using case-control samples. Genet Epidemiol. 33, 717 (2009).
Article PubMed PubMed Central Google Scholar
Lehman, C. D. et al. Mammographic breast density assessment using deep learning: clinical implementation. Radiology 290, 52–58 (2019).
Article PubMed Google Scholar
Colditz, G. A. & Hankinson, S. E. The Nurses’ Health Study: lifestyle and health among women. Nat. Rev. Cancer 5(5), 388–396 (2005).
Article PubMed Google Scholar
Tworoger, S. S. et al. The association of plasma DHEA and DHEA sulfate with breast cancer risk in predominantly premenopausal women. Cancer Epidemiol. Biomark. Prev. 15, 967–971 (2006).
Article Google Scholar
Lindström, S. et al. A comprehensive survey of genetic variation in 20,691 subjects from four large cohorts. PLoS ONE 12, e0173997 (2017).
Article PubMed PubMed Central Google Scholar
Stunkard, A. J., Sørensen, T. & Schulsinger, F. Use of the Danish Adoption Register for the study of obesity and thinness. Res. Publ. Assoc. Res Nerv. Ment. Dis. 60, 115–120 (1983).
PubMed Google Scholar
Colditz, G. A. et al. Joanne Knight Breast Health Cohort at Siteman Cancer Center. Cancer Causes Control 33, 623 (2022).
PubMed PubMed Central Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559 (2007).
Article PubMed PubMed Central Google Scholar
Randall, J. C. et al. Sex-stratified genome-wide association studies including 270,000 individuals show sexual dimorphism in genetic loci for anthropometric traits. PLoS Genet. 9, https://doi.org/10.1371/JOURNAL.PGEN.1003500 (2013).
Burgess, S., Scott, R. A., Timpson, N. J., Smith, G. D. & Thompson, S. G. Using published data in Mendelian randomization: a blueprint for efficient identification of causal risk factors. Eur. J. Epidemiol. 30, 543–552 (2015).
Article PubMed PubMed Central Google Scholar
Burgess, S. & Thompson, S. G. Interpreting findings from Mendelian randomization using the MR-Egger method. Eur. J. Epidemiol. 32, 377 (2017).
Article PubMed PubMed Central Google Scholar
Bowden, J., Davey Smith, G., Haycock, P. C. & Burgess, S. Consistent estimation in mendelian randomization with some invalid instruments using a weighted median estimator. Genet. Epidemiol. 40, 304–314 (2016).
Article PubMed PubMed Central Google Scholar
Hartwig, F. P., Smith, G. D. & Bowden, J. Robust inference in summary data Mendelian randomization via the zero modal pleiotropy assumption. Int. J. Epidemiol. 46, 1985–1998 (2017).
Article PubMed PubMed Central Google Scholar
Bowden, J. et al. Improving the visualization, interpretation and analysis of two-sample summary data Mendelian randomization via the Radial plot and Radial regression. Int. J. Epidemiol. 47, 1264–1278 (2018).
Article PubMed PubMed Central Google Scholar
Foley, C. N., Mason, A. M., Kirk, P. D. W. & Burgess, S. MR-Clust: clustering of genetic variants in Mendelian randomization with similar causal estimates. Bioinformatics 37, 531–541 (2021).
Article PubMed Google Scholar

Download references

Acknowledgements

The Nurses’ Health Study II is supported by the National Cancer Institute (U01CA176726 and R01CA67262). This research was supported in part by the Intramural Research Program of the National Institutes of Health (NIH). The contributions of the NIH authors are considered Works of the United States Government. The findings and conclusions presented in this paper are those of the author(s) and do not necessarily reflect the views of the NIH or the U.S. Department of Health and Human Services. The authors would like to acknowledge the contribution to this study from central cancer registries supported through the Centers for Disease Control and Prevention’s National Program of Cancer Registries (NPCR) and/or the National Cancer Institute’s Surveillance, Epidemiology, and End Results (SEER) Program. Central registries may also be supported by state agencies, universities, and cancer centers. Participating central cancer registries include the following: Alabama, Alaska, Arizona, Arkansas, California, Colorado, Connecticut, Delaware, Florida, Georgia, Hawaii, Idaho, Indiana, Iowa, Kentucky, Louisiana, Massachusetts, Maine, Maryland, Michigan, Mississippi, Montana, Nebraska, Nevada, New Hampshire, New Jersey, New Mexico, New York, North Carolina, North Dakota, Ohio, Oklahoma, Oregon, Pennsylvania, Puerto Rico, Rhode Island, Seattle SEER Registry, South Carolina, Tennessee, Texas, Utah, Virginia, West Virginia, Wyoming.

Funding

Open access funding provided by the National Institutes of Health.

Author information

Authors and Affiliations

Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD, USA
Xueyao Wu & Peter Kraft
Washington University School of Medicine in St. Louis, St. Louis, MO, USA
Shu Jiang & Graham Colditz
University of Maryland School of Medicine, Baltimore, MD, USA
Aaron Ge
Program in Genetic Epidemiology and Statistical Genetics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
Constance Turman
Population Health Sciences Department, Weill Cornell Medical School, New York, NY, USA
Rulla M. Tamimi
Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
Rulla M. Tamimi

Authors

Xueyao Wu
View author publications
Search author on:PubMed Google Scholar
Shu Jiang
View author publications
Search author on:PubMed Google Scholar
Aaron Ge
View author publications
Search author on:PubMed Google Scholar
Constance Turman
View author publications
Search author on:PubMed Google Scholar
Graham Colditz
View author publications
Search author on:PubMed Google Scholar
Rulla M. Tamimi
View author publications
Search author on:PubMed Google Scholar
Peter Kraft
View author publications
Search author on:PubMed Google Scholar

Contributions

P.K. and X.W. conceived and designed the study. R.M.T. prepared the phenotype data for NHS II. C.T. prepared the genotype data for NHS II. S.J. and G.C. prepared the mammogram risk score for NHS II. X.W. prepared the risk factors GWAS data. X.W. analyzed the data with the assistance of A.G. and C.T. X.W. and P.K. interpreted the results, with significant inputs and comments from R.M.T., S.J., and G.C. X.W. was a major contributor in writing the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Peter Kraft.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wu, X., Jiang, S., Ge, A. et al. Investigating the relationship between breast cancer risk factors and an AI-generated mammographic texture feature in the Nurses’ Health Study II. npj Breast Cancer 12, 5 (2026). https://doi.org/10.1038/s41523-025-00870-4

Download citation

Received: 25 February 2025
Accepted: 18 November 2025
Published: 23 December 2025
Version of record: 07 January 2026
DOI: https://doi.org/10.1038/s41523-025-00870-4