Main

The physiological and environmental forces leading to cancers with specific genotypes are largely unknown. Even as oncogenic mutations emerge in healthy tissue, a confluence of physiological, immunological and epigenetic changes ultimately elicits tumorigenesis1. Obesity is a risk factor for the development of numerous cancers2,3 and growing evidence suggests that medical interventions to reduce obesity can reduce cancer risk4. Obesity itself is associated with changes in systemic immune surveillance, metabolism and inflammation5,6,7. In composite, these changes have the potential to shape the selective pressure for specific driver mutations in cancer, connecting the evolution of tumor-intrinsic genotypes to aspects of systemic health.

Testing the hypothesis that body mass index (BMI, a quantitative surrogate of obesity status) associates with specific tumor genotypes at population scale requires jointly collected clinical and genomic data, which is largely absent in large-scale cancer genomics datasets to date. We systematically extracted information on BMI and other demographic factors in 34,274 patients profiled as part of their routine clinical care by the Memorial Sloan Kettering-Integrated Mutation Profiling of Actionable Cancer Targets (MSK-IMPACT) clinical sequencing platform8 (Extended Data Fig. 1). We assessed the statistical association between BMI and tumor genotype by modeling the incidence of mutations in 341 cancer-associated genes as a function of BMI, focusing on gene–cancer type pairs with sufficient numbers of mutant patients for adequate statistical power (Methods). Considering only oncogenic mutations as annotated by a Food and Drug Administration-recognized database9, we identified six genes across three separate cancer types demonstrating statistically significant enrichment with BMI in specific cancer types (q < 0.05) (Fig. 1a and Supplementary Table 1). In lung adenocarcinoma, three genes, KRAS (q = 2.6 × 10−5, estimate = 0.03), SETD2 (a tumor-suppressive histone methyltransferase, q = 1.31 × 10−2, estimate = 0.06) and PPP2R1A (a protein phosphatase known to regulate a variety of pathways, including phosphoinositide 3-kinase signaling, q = 1.31 × 10−2, estimate = 0.16), were mutated at a higher frequency in patients with obesity and one (EGFR q = 3.0 × 10−10, estimate = −0.05) was mutated at a lower frequency in the same patient group, although the number of patients with SETD2 and PPP2R1A mutations was small (Table 1). In addition, we found BAP1 in cancers of unknown primaries to be positively associated with BMI (q = 3.5 × 10−2, estimate = 0.15) and POLE in uterine endometrioid carcinoma to be negatively associated with BMI (q = 1.6 × 10−2, estimate = −0.08). POLE was similarly associated with a low BMI in a cohort restricted to only patients with endometrioid carcinoma (P = 6 × 10−3, estimate = −0.09). In contrast, we found no statistically significant associations between the presence of silent mutations (under putatively neutral selection) and BMI at the gene–cancer type level (q < 0.05) (Extended Data Fig. 2). We observed no statistically significant associations between copy number alterations and BMI.

Fig. 1: Oncogenic mutations are associated with BMI.
Fig. 1: Oncogenic mutations are associated with BMI.The alternative text for this image may have been generated using AI.
Full size image

a, Statistical association between continuous BMI and genotype across gene–cancer type pairs. The −log10(P values) and estimated sizes from univariate logistic regression are on the y and x axes, respectively. Statistically significant pairs are in black. b, Multivariate regression demonstrating that BMI categories (underweight, BMI < 18.5 kg m−2; healthy, 18.5 ≤ BMI < 25 kg m−2; overweight, 25 ≤ BMI < 30 kg m−2; obese, BMI ≥ 30 kg m−2) are associated with KRAS mutations independent of other clinical factors. Results for multivariate regression with BMI as a continuous variable are shown in Supplementary Table 3. Error bars represent the 95% confidence interval (CI). c, KRAS mutation frequency in patients with lung adenocarcinoma categorized by BMI and genetic ancestry. ASJ, Ashkenazi Jewish; EAS, East Asian; EUR, European. Error bars represent the s.e. d, EGFR (top) and KRAS (bottom) mutation frequency in BMI categories in the MSKCC cohort. Error bars represent the s.e. e, EGFR (top) and KRAS (bottom) mutation frequency in BMI categories in the DFCI cohort. Error bars represent the s.e. f, EGFR (top) and KRAS (bottom) are not associated with weight loss before cancer diagnosis in lung adenocarcinoma using the χ2 test.

Table 1 Patient distribution by BMI and genotype with relative risk

Driver mutations are known to accumulate at different rates in patients of distinct genetic ancestries, raising the possibility that an association between tumor genotypes and obesity status could emerge indirectly10. To control for the contributions of genetic ancestry, we obtained ancestry information for each patient from a previously reported computational analysis of ancestry from germline sequencing data11. After correcting for age, sex, genetic ancestry and mutational burden, all six significant univariate associations between BMI and genotype remained statistically significant in the same direction, although the effect in some individual ancestries (for example, patients of Ashkenazic ancestry harboring KRAS mutations) was attenuated (Fig. 1c and Supplementary Table 1). Thus, elevated BMI is associated with specific cancer genotypes independent of genetic ancestry, sex, age and tumor mutational burden (TMB).

We further considered the possibility that other demographic factors, etiologies or exposures might confound an association between BMI and tumor genotypes. To directly examine this possibility, we focused on lung adenocarcinoma, where there is an established association between smoking status and elevated frequency of KRAS mutations/reduced frequency of EGFR mutations12. We modeled the presence of driver mutations in KRAS and EGFR as a function of BMI, clinical smoking history, TMB, ancestry, age and sex, for 4,150 patients with lung adenocarcinoma with adequate genomic and clinical data. The association between BMI and EGFR (P = 7.4 × 10−5, estimate = −0.04) and BMI and KRAS (P = 1 × 10−3, estimate = 0.03) remained statistically significant after controlling for all covariates (Fig. 1b–d and Supplementary Table 3). We also controlled for smoking quantitatively using pack-years and genomically using a previously published, computationally inferred, smoking signature13 and associations between BMI and EGFR and KRAS remained statistically significant (pack-years: EGFR = 0.008, KRAS = 0.02; smoking signature: EGFR = 7.4 × 10−5, KRAS = 0.001). As smoking is associated with C/G > A/T substitutions common among KRAS drivers, we repeated the above analysis in KRAS separately for C/G > A/T mutations and non-C/G > A/T mutations. We found a consistent positive association in KRAS across both subsetted groups (C > A estimate = 0.02, P = 0.02, non-C > A estimate = 0.03, P = 4 × 10−3). We also considered the possibility that socioeconomic status or exposure to PM2.5 air pollutants could potentially explain the association between genotype and obesity. All six statistically significant genotype–obesity associations remained statistically significant after controlling for socioeconomic status (via the Yost index), air pollution (https://www.breezometer.com/air-quality-map/) or both (Supplementary Table 4).

To corroborate these findings in an independent cohort, we obtained somatic mutation calls and BMI information from 2,727 patients with lung adenocarcinoma at a separate institution. Analysis of this dataset confirmed that EGFR mutations were negatively associated with BMI (P = 5 × 10−2, estimate = −0.02) and KRAS and SETD2 mutations were positively associated with BMI (P = 9 × 10−4, estimate = 0.03; P = 0.01, estimate = 0.04, respectively) (Fig. 1e and Supplementary Table 5), after controlling for age, sex and ancestry. PPR21A mutation was not significantly associated with obesity although the number of patients with this mutation was small (P = 0.84, estimate = 0.01, n = 22). Furthermore, after controlling for pack-years and clinical smoking status in this validation cohort, our results remained statistically significant (Supplementary Table 5). Thus, in lung adenocarcinoma, obesity predisposes patients to a higher frequency of KRAS and SETD2 mutations and a lower frequency of EGFR mutations.

Finally, we considered the possibility that associations between BMI measurements and genomic alterations were indirectly induced by cancer-associated weight changes at cancer diagnosis (for example, cachexia). To evaluate this possibility in lung cancer, we reviewed initial diagnosis medical notes from a subset of patients with treatment-naive lung adenocarcinoma treated at Memorial Sloan Kettering (MSK) and classified them according to whether they exhibited signs of cachexia, anorexia or other unexplained weight loss in the 6 months before diagnosis (Methods). There was no association between clinically notable weight loss before diagnosis and the presence of EGFR (P = 0.70, odds ratio (OR) = 1.16) or KRAS (P = 0.69, OR = 1.15) mutations (Fig. 1f), indicating that the association between BMI and these mutations is not confounded by pre-diagnosis weight loss.

Previous work has demonstrated the modulatory effects of obesity on the emergence of cancer, which is complex and results in higher risk of certain cancers, such as liver and colorectal, and lower risk of others, such as lung. Our results suggest that, after controlling for numerous host and environmental factors, obesity also predisposes patients to cancers with particular somatic genotypes.

Our data suggest that EGFR-mutant lung adenocarcinoma has a lower frequency in patients with elevated BMI and, conversely, that KRAS-mutant lung adenocarcinoma is more frequent in patients with obesity. These findings have treatment implications: both genes have molecularly targeted therapies14 and higher proportions of immune-sensitive KRAS drivers may explain why obese patients respond better to immunotherapy15. More fundamentally, few modifiable risk factors of EGFR/KRAS-mutant lung adenocarcinoma have been characterized. A recent study6 suggests that air pollution may trigger microenvironmental changes, allowing for proliferation of cells with activated EGFR. This is one of the first studies to suggest that obesity is a similar modifier of EGFR-mutant lung adenocarcinoma risk. There are many plausible mechanisms by which obesity might affect selection for certain molecular drivers, including modulation of lipid metabolism in cancer cells via PIK3CA/mammalian target of rapamycin pathways7. Obesity itself may also inhibit the function of CD8+ T cells5, which can affect the selective pressure for immunogenic driver alterations. As KRAS-mutant lung adenocarcinoma is generally more sensitive to immune checkpoint blockade than EGFR-mutant lung adenocarcinoma16, it seems plausible that changes to the immune microenvironment downstream of obesity may disproportionately allow for proliferation of KRAS-mutant lung adenocarcinoma in patients with obesity. Although real-world histopathological imaging data increasingly excel at identifying tumor-infiltrating lymphocytes and other markers of immune system activity17, accompanying BMI data are not routinely published. Such data are necessary to confirm any causal link between obesity and immunological mechanisms of selection for particular drivers.

Our study has several limitations. We were underpowered to detect associations between BMI and genomic alterations in rare cancer types and infrequently mutated genes. There may be other genomic changes, including structural rearrangements, associated with BMI, which we did not consider in this analysis. BMI itself is only one of many metrics for studying obesity and we did not have access to other metrics, such as skinfold thickness, in the clinical record3. Genomic studies do not fully capture the molecular landscape of a tumor, and broader studies that combine genomic, transcriptomic and metabolomic analyses would provide a more well-rounded view of the effect of obesity on cancer tumor phenotypes18. Finally, and most importantly, prospective studies will have to be done to determine whether modifications to BMI, through diet, exercise or medical interventions, can causally modify the incidence of particular cancer genotypes.

Methods

Patient cohort

The present study utilized data from MSK-IMPACT (no. NCT01775072), a prospective observational cohort study of tumor evolution. The present study received full ethical approval by the institutional review board at MSK Cancer Center (MSKCC). For the present study, patients provided written informed consent for the use of their genomic data for research. Participants in the present study were not compensated for their participation. Tumors were sequenced using the MSK-IMPACT assay through to 23 March 2023. In patients with multiple samples, only one sample (the earliest sampled primary tumor) was included in the final cohort. Clinical characteristics were annotated per the standard MSK-IMPACT workflow. The total cohort constitutes 34,274 samples spanning 102 cancer types. Only cancer–gene pairs with >50 samples in the cancer type and >10 mutations of the given gene in the cancer type were tested for genotype–BMI associations. For each patient, we identified the BMI measurement collected at the date nearest to the date of sample acquisition leading to genomic sequencing, that is, the earliest possible date for which a given genomic alteration could be confirmed in a patient’s tumor. BMI distribution of cancer types is shown in Extended Data Fig. 1. Patients without a BMI measurement within 30 d of tumor acquisition were excluded. Patients at the Dana-Farber Cancer Institute (DFCI) had tumor genomic profiling with OncoPanel19, a targeted sequencing platform analogous to MSK-IMPACT.

Targeted DNA sequencing with MSK-IMPACT

DNA sequencing was performed using the MSK-IMPACT sequencing panel, which is a hybridization capture-based, next-generation sequencing assay, in a Clinical Laboratory Improvement Amendments-certified molecular laboratory. Genomic DNA from formalin-fixed, paraffin-embedded primary or metastatic tumors and matched normal samples was extracted and sheared. Customized probes were then synthesized for targeted sequencing of all exons and select introns of 341, 410, 468 or 505 genes. Illumina HiSeq 2500 was used to capture pooled libraries containing captured DNA fragments to high, uniform coverage (>500 median coverage). All classes of genomic alterations including substitutions, insertions/deletions (indels), copy number alterations and rearrangements were determined and called against the patient’s matched normal sample. The computational pipelines used for variant calling are based on standard best practices and used a combination of open-source and custom-written scripts and programs.

The OncoKB precision oncology knowledge base was used to annotate genomic alterations. OncoKB identifies functionally relevant cancer variants and their potential clinical actionability. Only alterations classified as oncogenic by OncoKB were used in this analysis. Reported alteration frequencies were adjusted to account for the specific set of genes included in each version of the MSK-IMPACT panel used by dividing the number of gene-specific alterations by the number of samples for which a given gene was sequenced.

Logistic regression model

Logistic regression was performed where BMI (as a continuous variable) was used to predict somatic mutation status (‘Gene’) in a particular cancer type. Only detailed cancer types with >50 patients and the 341 genes in the initial MSK-IMPACT panel were included in the present study. Covariates for the logistic regression included age, sex, ancestry and TMB. Smoking status was included in models with lung adenocarcinoma. The glm function from the stats R package was used to conduct the modeling. P values were Benjamini–Hochberg corrected.

The following equations were used:

  1. (1)

    Gene ~ BMI

  2. (2)

    Gene ~ BMI + age + sex + ancestry + TMB

  3. (3)

    Gene ~ BMI + age + sex + ancestry + smoking status + TMB.

Pre-diagnosis weight loss identification and analysis

To identify pre-diagnosis weight loss in patients, 800 initial consultation notes were randomly selected from patients with treatment-naive lung adenocarcinoma and manually categorized. Records with mentions of the patient reporting weight loss, anorexia or decreased appetite were labeled as having pre-diagnosis weight loss. Records with no mention of weight loss, anorexia or decreased appetite were labeled as having no pre-diagnosis weight loss.

Statistics and reproducibility

No statistical method was used to predetermine sample size. The experiments were not randomized and investigators were not blinded to allocation during experiments and outcome assessment because no subjective judgements were used.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.