Introduction

The unequal landscape of healthcare outcomes in the United States, particularly among Black patients, demands immediate attention. Black patients face higher rates of chronic disease, higher incidence of preventable illness, and higher all-cause mortality compared to white patients1,2,3,4. These disparities are particularly evident within the field of oncology. For many cancer types, Black patients are diagnosed with more advanced or aggressive cancers than white patients5,6,7. Although cancer outcomes have drastically improved over the past several decades, Black patients continue to have higher rates of cancer-related death8,9,10. Mortality disparities are exacerbated within certain cancer types, including breast and endometrial cancers, in which Black patients exhibit 41% and 21% higher mortality, respectively, compared to white patients11,12,13. Uncovering the mechanisms that mediate these unequal outcomes could help identify prevention or treatment strategies that may aid those populations most at risk.

The source of racial disparities in cancer outcomes is at present unresolved and may result from social, environmental, and genetic factors14,15,16,17,18,19,20. In the United States, Black individuals are more than twice as likely to live below the federal poverty line than white individuals, and lower socioeconomic status has been linked with decreased healthcare access and lower cancer screening rates21,22,23,24,25. In addition, people within economically-deprived communities have disproportionately higher rates of environmental carcinogen exposure, which may increase the risk of cancer development25,26. Within the healthcare system itself, Black patients received worse care compared to white patients in more than half of quality care metrics in the 2022 National Healthcare Quality and Disparity Report27. These findings extend to cancer-specific interventions, as Black patients experience more delays in chemotherapy induction for breast cancer treatment and are less likely to have adequate oncological resection for gastrointestinal cancers treated with curative intent surgery28,29.

In addition to these social and environmental influences on cancer mortality, recent research has raised the possibility that genetic differences between patient populations could affect the development and/or progression of cancer. To date, most population-based cancer profiling efforts have sought to investigate variability in the prevalence of somatic point mutations between racial and ethnic groups30,31,32,33,34. These studies have uncovered certain differences in the frequency of mutations in common oncogenes and tumor suppressors that could impact disease pathogenesis. For example, Black patients have been found to exhibit higher rates of mutations in the tumor suppressor TP53 compared to white patients, and TP53 inactivation has consistently been linked with poor prognosis35,36,37,38,39,40,41. Similarly, Black patients with lung cancer have fewer mutations in the druggable oncogene EGFR, which may affect treatment options42. Uncovering population-based differences in mutation profiles can aid in clinical assessment and may shed light on strategies to ameliorate outcome disparities.

Comparatively less is known about population-based associations for other types of somatic alterations in cancer beyond single-nucleotide point (SNP) mutations. Notably, chromosomal copy number alterations (CNAs) are pervasive across tumor types and have been linked with disease progression, drug resistance, and poor patient outcomes43,44,45,46. One recent study examined the prevalence of arm-scale CNAs across a large cohort of cancer patients and found few significant differences between Black and white populations47. Population-based differences in many other classes of CNAs have not been previously investigated.

One common type of chromosomal alteration is a whole-genome duplication (WGD) event, in which a cell’s chromosome complement doubles. The causes and consequences of WGDs are poorly understood. Mutations in TP53 have been consistently associated with WGDs in patient sequencing, and cell culture experiments have demonstrated that loss of TP53 facilitates the outgrowth of cells that have undergone a failed mitosis48,49. Environmental carcinogens like combustion products have been shown to cause point mutations, but a link between these pollutants and WGDs has not been previously demonstrated50,51. Other genetic and environmental drivers of WGD events remain obscure44,52,53. WGDs enhance tumor adaptability and increase metastatic dissemination, potentially by enhancing tumor heterogeneity and allowing cancers to sample a wider range of karyotypes52,54,55,56. Approximately 30% of tumors exhibit WGDs, but whether patient race or ethnicity is associated with these events is unknown52.

In this work, we show that tumors from self-reported Black cancer patients exhibit a significantly higher frequency of WGD events compared to tumors from self-reported white patients across multiple cancer types. We demonstrate that this disparity in WGD frequency is associated with worse clinical outcomes and may be linked to differential environmental carcinogen exposure. Our findings identify a type of large-scale chromosomal alteration that is more prevalent in Black cancer patients and may contribute to racial disparities in cancer outcomes.

Results

Tumors from self-reported Black patients display an increased frequency of WGDs

WGD events in cancer are associated with genomic instability and aggressive disease (Fig. 1A)52,57. We investigated the frequency of WGD events in cancers from different patient cohorts: MSK-MET (n = 13,071 patients), The Cancer Genome Atlas (TCGA) (n = 8060 patients), and the Pan-cancer Analysis of Whole Genomes (PCAWG) (n = 1963 patients)58,59,60. These three datasets represent the largest publicly-available tumor sequencing cohorts with both copy number alteration data and patient demographic information. Full demographics of included patients and cancer types can be found in Table S1. We compared the frequency of WGD events in cancers from self-reported Black and white patients in the MSK-MET and TCGA cohorts and between cancers from individuals with inferred African and European ancestry in the PCAWG cohort. We discovered that cancers from self-reported Black patients and patients with inferred African ancestry exhibited a significantly higher incidence of WGDs compared to cancers from self-reported white patients or patients with inferred European genetic ancestry (Fig. 1B-D). Notably, CNAs in each of these datasets were detected using different genomic technologies: SNP arrays, targeted gene sequencing, and whole-genome sequencing, for TCGA, MSK-MET, and PCAWG, respectively58,59,60. As these findings were consistent across all three cohorts, we anticipate that they are robust and independent of any platform-specific artifacts.

Fig. 1: WGDs are more common among self-reported Black cancer patients and those with inferred African genetic ancestry.
figure 1

A A schematic of the whole-genome duplication process. WGDs produce a cell with a doubled chromosome complement (typically 4N). WGD events are associated with metastasis and disease progression. Loss of the tumor suppressor TP53 promotes WGDs; other causes of WGDs are largely unknown. The frequency of WGDs in self-reported Black and white patients from three different cohorts: B MSK-MET, C TCGA, and D PCAWG. Note that in the PCAWG dataset, a patient’s self-reported race was not available, and instead inferred genetic ancestry was used (African (AFR) vs European (EUR)). E The frequency of WGD events in self-reported Black and white patients with either breast cancer, endometrial cancer, or NSCLC, in the MSK-MET and TCGA cohorts. Source data are provided as a Source Data file. Statistical testing was performed via two-tailed Pearson’s Chi-squared test. Statistical significance: NS p ≥ 0.05, * p < 0.05, ** p < 0.01, *** p < 0.001, **** p < 0.0001. Figure 1A was created with BioRender.com released under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license (https://creativecommons.org/licenses/by-nc-nd/4.0/deed.en).

The increase in WGD events in self-reported Black patients compared to self-reported white patients ranged from 11% in the TCGA cohort to 35% in the PCAWG cohort. Additionally, the overall rate of WGD events ranged from 26% in MSK-MET to 36% in TCGA. However, within cancer types that were shared across datasets, the frequency of WGD events was highly correlated (Fig. S1). For instance, thyroid cancers consistently displayed the lowest incidence of WGDs (0–4%) while ovarian cancers consistently displayed the highest incidence of WGDs (55–60%), in alignment with other reports44,52. The overall differences in WGD frequency between cohorts may reflect differences in the distribution of cancer types.

Next, we sought to evaluate whether other minority populations also exhibited an increase in WGD events. We therefore investigated whether WGD events were elevated in cancers from patients who identified as Asian, which represented the next-most common racial group in the TCGA and MSK-MET cohorts. In both patient cohorts, we did not detect a significant difference in WGD frequency between self-reported white and Asian patients (Fig. S2A and Table S2). Additionally, tumors from males and females displayed equivalent frequencies of WGD events within each cohort (Fig. S2B, Table S2).

Tumors from self-reported Black patients with specific cancer types display an increased frequency of WGDs

We considered the possibility that the increased incidence of WGD events in our pan-cancer analysis of self-reported Black patients could result from differences in the representation of distinct cancer types between populations. However, Black patients have historically been underrepresented in genomic studies, which limits our statistical power to detect significant differences in every cancer lineage61,62. MSK-MET includes 7.3% self-reported Black patients (n = 959 patients), TCGA includes 10.8% self-reported Black patients (n = 872 patients), and PCAWG includes 6.2% patients with inferred African ancestry (n = 122 patients) (Table S1). To minimize false negatives resulting from the underrepresentation of Black patients, we focused our analysis on cancer types for which we had genomic data from at least 60 self-reported Black patients in both the MSK-MET and TCGA datasets. The three cancer types exceeding this threshold were breast cancer, endometrial cancer, and non-small cell lung cancer (NSCLC) (Table S1). Stratification of PCAWG by cancer type was not performed due to the low number of patients with inferred African ancestry. In both TCGA and MSK-MET, we detected a significant increase in WGD events in self-reported Black patients with breast and endometrial cancers (Fig. 1E). For NSCLC, we detected a significant increase in the MSK-MET cohort but not the TCGA cohort. The increase ranged from 33% in breast cancer (TCGA) to 202% in endometrial cancer (MSK-MET) (Fig. 1E). These results indicate that self-reported Black patients have a significantly higher incidence of WGD events both across cancers and within individual cancer types.

Analysis of WGD frequency by tumor stage and histological subtype

WGD abundance varies between histological cancer subtypes and is more common in advanced malignancies52. Accordingly, we considered the possibility that differences in the prevalence of histological subtypes or differences in the stage at which tumors were diagnosed could produce the increase in WGD events in self-reported Black cancer patients that we observed. However, we determined that WGD events were still significantly more common in tumors from self-reported Black patients for many individual cancer stages and subtypes (Figs. S3, 4; Table S1). For instance, self-reported Black patients with either stage II or III breast tumors had a higher incidence of WGD events compared to white patients with similarly-staged tumors (Fig. S3B). Additionally, while the frequency of histological subtypes varied between self-reported Black and white patients, when we limited our analysis to only the most common histological subtypes of each cancer (breast cancer: invasive ductal carcinoma; endometrial cancer: endometrioid tumors, and NSCLC: adenocarcinoma), we still detected an increased frequency of WGD events among self-reported Black patients (Fig. S4A, B). These results indicate that the increased incidence of WGDs in self-reported Black cancer patients is not simply due to differences in tumor stage or histological subtype at diagnosis.

Eliminating overlap between the TCGA and PCAWG cohorts

596 patients in TCGA were also profiled as part of the PCAWG cohort. To ensure statistical independence, we repeated our analysis of WGD frequencies in TCGA after eliminating the patients who were also included in PCAWG. Within this smaller patient population, we still observed a trend towards increased WGDs in self-reported Black patients in a pan-cancer analysis and a significant increase in WGDs in self-reported Black patients with breast cancer (Fig. S5 and Table S3).

Analysis of WGDs by inferred genetic ancestry

Recent studies have urged caution in the use of race when conducting population-based research, noting that widely used definitions of race represent artificial social constructs63,64. Instead, shared genetic ancestry may better reflect population differences in disease risk. Nonetheless, race may still be useful for investigating patterns of health and disease in the US due to its association with the social determinants of health, including poverty, pollution, systemic racism, and lack of healthcare access65. According to the National Academy of Sciences’ 2023 report Using Population Descriptors in Genetics and Genomics Research, “race… may be a useful population descriptor for researchers who wish to measure a consequential form of social status and affiliation… [R]ace may be a proxy for the experience of racism in health disparities studies.”

As described above, we found that both individuals who self-reported as Black or African American and individuals with African ancestry were more likely to exhibit WGD-positive cancers compared to individuals who self-reported as white or individuals with European ancestry (Fig. 1B–E). To extend this investigation, we re-analyzed the MSK-MET dataset using inferred genetic ancestry instead of racial self-reporting34. We found that the overall patterns of WGD were conserved: African ancestry was associated with a 16% increase in WGD events in a pan-cancer analysis and with a 32-135% increase in breast, endometrial, and NSCLC (Fig. S6A, B; Table S4). Additionally, inferred African ancestry and inferred European ancestry were 96.3% and 99.9% concordant with self-reported Black and white identity, respectively (Fig. S6C, D). Finally, the fraction of inferred African ancestry was associated with an increasing frequency of WGDs, while the fraction of inferred European ancestry was associated with a decreasing frequency of WGDs (Fig. S6E, F). We conclude that WGD events are associated with both inferred African ancestry and Black self-identity.

Increasing aneuploidy burden associated with WGD events in self-reported Black patients

Previous analyses of genetic differences between cancers from Black and white patients have focused on exploring the spectrum of somatic point mutations in each population30,31,32,33,34,42. As our investigation uncovered a significant increase in the frequency of WGDs in Black patients, we next sought to expand our analysis of CNAs to interrogate all aneuploidy events. We calculated the aneuploidy burden, defined as the sum total of arm-scale CNA events, in each tumor genome. We discovered that cancers from self-reported Black patients had higher levels of aneuploidy in a pan-cancer analysis (Fig. S7A, B). We then analyzed breast cancer, endometrial cancer, and NSCLC individually, as these cancers harbored the largest number of specimens from self-reported Black patients, and we found that the average number of aneuploid chromosomes was higher in self-reported Black patients with these cancer types as well (Fig. S7C). However, cancers that have undergone WGD events have consistently been observed to exhibit a higher aneuploidy burden than WGD-negative cancers56,66. Indeed, when we analyzed the aneuploidy levels in WGD-positive and WGD-negative cancers separately, the differences between Black and white patients were muted (Fig. S7D–F). There was no significant increase in aneuploidy burden among WGD-negative Black patients, and among WGD-positive Black patients there was an increase in aneuploidy burden in the TCGA cohort but not the MSK-MET cohort in a pan-cancer analysis.

Finally, we examined individual chromosome arm gain and loss events in self-reported Black and white cancer patients. We found that cancers from self-reported Black patients were significantly more likely to lose the q arm of chromosome 4 and p arm of chromosome 8 compared to cancers from self-reported white patients (Fig. S7G). There was no significant difference among any of the other 39 possible aneuploidies that were quantifiable. In a subtype-specific analysis, we did not observe any consistent difference in aneuploidy patterns in NSCLC, while in breast and endometrial cancers, self-reported Black patients demonstrated increased frequencies of specific aneuploidy events shared between MSK-MET and TCGA. In breast cancer, self-reported Black patients demonstrated an increase in 16q gains compared to self-reported white patients. Similarly, self-reported Black patients consistently displayed 16q and 17p loss events in endometrial cancer. When patients were subdivided by WGD status, no significant differences in arm-scale aneuploidies were detected (Fig. S7G). We conclude that there are moderate differences in the quantity and frequency of arm-scale aneuploidies in self-reported Black patients, though the most prevalent and consistent copy number difference is an increase in WGD events.

Association of WGD events with TP53 mutations

We sought to uncover the somatic alterations associated with WGD events in self-reported Black and white patients. For this and subsequent analyses, we focused on the MSK-MET cohort, as this was the largest single patient cohort and had the best sequencing coverage of cancer-relevant genes. We constructed a logistic regression model linking somatic alterations with WGD status while correcting for cancer type (Fig. 2A, Supplementary Data 1). The strongest predictor of WGD status was the presence of mutations in TP5352. In contrast, mutations in KRAS, BRAF, and PTEN were significantly associated with a reduced likelihood of WGD events. The association between TP53 mutations and WGD events has been previously observed, as loss of p53 enhances the proliferation of cells that have undergone tetraploidization52,67,68. The reasons why mutations in certain other oncogenes and tumor suppressors are associated with fewer WGD events is at present unclear and may reflect differences in the evolutionary trajectories of these cancers69,70. We repeated our logistic regression analysis for self-reported Black and white patients, considered separately (Fig. 2B, C, Supplementary Data 2, 3). In self-reported Black patients, the only significant features associated with WGD status with a q-value < 0.1 were TP53 mutations and amplifications of cyclin E (CCNE1). Cyclin E gains have recently been identified as a driver of WGDs, and both features were also significantly associated with WGD events in our logistic regression model of tumors from white patients (Fig. 2C)53. These results suggest that similar somatic alterations drive WGDs in both self-reported Black and white patients.

Fig. 2: Genetic analysis of WGD events in self-reported Black and white cancer patients.
figure 2

A A volcano plot displaying genetic alterations associated with an increased or decreased likelihood of WGD events across both self-reported Black and white patients in the MSK-MET cohort. Acronyms: Mutant (Mut). B A table displaying 10 events exhibiting the strongest correlation with WGD events among self-reported Black patients. The bolded numbers indicate q-values below 0.05. C A table displaying 10 events exhibiting the strongest correlation with WGD events among self-reported white patients. The bolded numbers indicate q-values below 0.05. D A bar graph displaying the frequency of TP53 mutations across self-reported Black and white cancer patients. E A bar graph displaying the frequency of TP53 mutations across self-reported Black and white patients with either breast cancer, endometrial cancer, or NSCLC. F A bar graph displaying the frequency of WGD events among self-reported Black and white cancer patients, divided based on TP53 status. G A bar graph displaying the frequency of WGD events among self-reported Black and white patients with either breast cancer, endometrial cancer, or NSCLC, divided based on TP53 status. H A bar graph displaying the distribution of different types of TP53 mutations in self-reported Black and white cancer patients. I A bar graph displaying the distribution of different types of TP53 mutations in self-reported Black and white patients with either breast cancer, endometrial cancer, or NSCLC. J A lollipop plot displaying the sites of TP53 mutations in tumors from either self-reported white (top) or Black (bottom) cancer patients. Source data are provided as a Source Data file. Statistical testing was performed via two-tailed Wald test (AC) with correction by Benjamini-Hochberg’s method (B, C) and two-tailed Pearson’s Chi-squared test (DI). Statistical significance: NS p ≥ 0.05, * p < 0.05, ** p < 0.01, *** p < 0.001, **** p < 0.0001.

As the types of somatic alterations associated with WGD events in self-reported Black and white patients were similar, we considered the possibility that differences in frequencies of these alterations could influence the prevalence of WGDs. Consistent with previous observations, we found that self-reported Black patients were significantly more likely to harbor TP53 mutations than self-reported white patients in both a pan-cancer analysis and in breast and endometrial cancers but not NSCLC (Fig. 2D, E)35,36,37,38,39,40. Next, we separated patients based on TP53 status. We observed that there was no significant difference in WGD frequency between Black and white patients with TP53-WT cancers. However, among patients with TP53-mutant cancers, self-reported Black patients showed a trend towards increased WGD events in a pan-cancer analysis and a significant increase in the NSCLC cohort (Fig. 2F, G). These results suggest that the increased frequency of TP53 mutations contributes to but cannot fully account for the increased incidence of WGD events in Black patients.

Next, we investigated whether different classes of TP53 mutations could underlie the different prevalence of WGD events within self-reported Black and white populations. However, the overall distribution of classes of TP53 mutations (e.g., missense vs nonsense) was similar between self-reported Black and white patients (Fig. 2H, I). Additionally, the same set of p53-inactivating point mutations, including the R175, R248, and R273 mutations, were commonly observed among both Black and white patients (Fig. 2J).

We subsequently investigated the association between WGD events and CCNE1 amplifications in self-reported Black and white patients. Consistent with previous results, we found that CCNE1 amplifications were more common in self-reported Black patients (Fig. S8A, B)35. However, when we split patients based on CCNE1 amplification status, we observed that a consistent difference in the frequency of WGD events between self-reported Black and white patients remained (Fig. S8C, D). We conclude that somatic genetic alterations are unable to fully account for increased prevalence of WGD events among self-reported Black patients.

Association between self-reported race, WGD Status, and patient outcome

We sought to determine whether the increased incidence of WGD events in self-reported Black patients was linked with racial disparities in patient outcome. Consistent with previous observations, WGD events were strongly associated with hallmarks of aggressive disease52,71. The rate of WGDs was significantly higher in metastatic samples compared to primary tumor samples in both a pan-cancer analysis and in individual cancer types (Fig S9A, B). WGD-positive tumors were more likely to exhibit progressive disease after frontline treatment and patients with WGD-positive tumors were less likely to be tumor-free at the conclusion of the observational follow-up period (Fig S9C-D). We performed a multivariate Cox proportional hazard regression analysis including WGD status and common clinical variables (age, sex, TP53 status, MSI status, aneuploidy burden, cancer type). WGD status was significantly associate with shorter patient survival even including these known disease-modifying factors (Hazard Ratio: 1.21, 95% CI: 1.12–1.30, z(8) = 4.783, p < 1 × 105) (Fig S10). These results are in agreement with other studies demonstrating that WGD can drive malignant progression52,72.

Next, we looked at the interaction of self-reported race, WGD, and patient outcome. Consistent with known national trends, we found that self-reported Black cancer patients exhibited a significantly shorter overall survival time following diagnosis compared to self-reported white patients (Fig. 3A)8,9,10. Similarly, WGD events were also associated with worse patient outcomes in a Kaplan-Meier analysis (Fig. 3B). Interestingly, among the subset of patients with WGD-positive tumors, there was no significant difference in survival time between self-reported Black and white patients (Fig. 3C). However, within the WGD-negative patient subset, a significant difference in survival between self-reported Black and white patients was observed (Fig. 3D). These results are consistent with prior reports demonstrating that WGD events drive metastases and aggressive disease, and suggest that the increased incidence of WGD events in self-reported Black patients may be linked with worse overall survival46,58,73. We speculate that within WGD-negative patients, WGD events may be occurring post-diagnosis, or additional genetic and environmental factors may be contributing to these disparate outcomes.

Fig. 3: Pan-cancer survival analysis in self-reported Black and white cancer patients.
figure 3

A A Kaplan-Meier plot displaying survival of self-reported Black and white patients in the MSK-MET cohort. B A Kaplan-Meier plot displaying survival of cancer patients based on WGD status. C A Kaplan-Meier plot displaying survival of WGD-positive self-reported Black and white cancer patients. D A Kaplan-Meier plot displaying survival of WGD-negative self-reported Black and white cancer patients. Source data are provided as a Source Data file. Statistical testing was performed via logrank test (AD).

We repeated this analysis within the MSK-MET breast, endometrial, and NSCLC cohorts. We observed that self-reported Black patients with endometrial cancer, but not breast or NSCLC, had significantly shorter survival times compared to white patients within this study (Fig. S11A–C). Racial disparities within breast and NSCLC outcomes have been reported in other analyses, and we speculate that the MSK-MET cohort sizes may not be sufficiently large to detect survival differences that are moderate overall11,12,13. Within endometrial cancer, we observed that patients with WGD-positive tumors had worse outcomes (Fig. S11D). Consistent with our pan-cancer analysis, within the subset of tumors that were WGD-positive there was no significant difference between self-reported Black and white patient outcomes, while a significant difference remained apparent within the WGD-negative subset (Fig S4E,F).

Finally, we further divided the pan-cancer survival analysis based on TP53 status. As expected, TP53-mutant tumors had significantly worse overall survival (Fig. S12A). In general, further subdividing patients based on TP53 status, in addition to race and WGD status, minimized race-based differences in survival time (Fig. S12B–I). In total, these results suggest that the increased incidence of TP53 mutations and WGD events contribute to the worse overall outcomes for self-reported Black cancer patients, although our findings do not rule out the influence of additional social, environmental, and genetic factors.

Analysis of WGD events in self-reported Black and white patients with prostate cancer

In the United States, there is a pronounced and pernicious racial disparity between Black and white patients with prostate cancer74,75. Overall, African Americans are about twice as likely to have metastatic disease and die of prostate cancer compared to white Americans76,77. We therefore investigated whether WGD events could contribute to this disparity. We found that there was no significant difference in WGD frequency between self-reported Black and white patients in the MSK-MET cohort (Fig. S13A). We obtained a similar result within TCGA, though we note that this analysis is limited by the fact that there are only seven self-reported Black patients with prostate cancer in this cohort (Fig. S13B). Consistent with previous studies, we found that self-reported Black prostate cancer patients had worse overall survival than white patients and WGD events were also associated with shorter survival (Fig. S13C, D)74,77. As with our pan-cancer analysis, we found that the survival disparities between Black and white patients was maintained among WGD-negative cancers, while no difference in survival was apparent among cancers that have undergone WGDs (Fig. S13E, F). In total, our data indicate that WGD events are associated with aggressive disease in prostate cancer overall; however, these data also suggest that WGD events themselves are not a prominent driver of disparities in this specific cancer type.

Self-reported race and WGD status are associated with metastases to the same anatomic sites

Next, we used additional patient information that was collected as part of the MSK-MET dataset to further explore the clinical correlates of self-reported race and WGD status. Within this cohort, self-reported Black patients were diagnosed and underwent surgery at earlier ages compared to self-reported white patients, and self-reported Black patients were also more likely to die younger (Fig. 4A). Similarly, WGD-positive tumors were associated with younger diagnoses, younger age at surgery, and younger death in all patients, regardless of race (Fig. 4B). Next, we examined microsatellite instability in each patient, which is a genomic state characterized by the accumulation of point mutations in repetitive sequences78,79. We found that tumors from self-reported Black patients were significantly less likely to exhibit high microsatellite instability (MSI-H) compared to tumors from white patients, and MSI-H status was also less common in WGD-positive tumors (Fig. 4C, D). MSI-H status has been linked with favorable outcomes, and the under-representation of MSI-H cancers among tumors from both self-reported Black patients and patients with WGD-positive disease could represent another factor that contributes to differences in patient survival80,81.

Fig. 4: Clinical correlates of WGD status in self-reported Black and white cancer patients.
figure 4

A A table displaying the demographics of self-reported Black and white cancer patients in the MSK-MET cohort. B A table displaying the demographics of cancer patients based on WGD status. C A bar graph displaying the frequency of microsatellite instability (MSI-H) among tumors from self-reported Black and white cancer patients. D A bar graph displaying the frequency of microsatellite instability (MSI-H) among cancer patients based on WGD status. E The frequency of metastatic dissemination to different anatomic sites, divided by patient race. Locations written in red are significantly more likely among self-reported Black patients, no locations were more likely among self-reported white patients. For Ovary and Female Genital locations, frequency of metastasis represents female patients only. For Male Genital location, frequency of metastasis represents male patients only. F The frequency of metastatic dissemination to different anatomic sites based on WGD status. Locations written in red are significantly more likely among WGD-positive patients, no locations were more likely among WGD-negative patients. For Ovary and Female Genital locations, frequency of metastasis represents female patients only. For Male Genital location, frequency of metastasis represents male patients only. Central Nervous System (CNS), Lymph Node (LN), Peripheral Nervous System (PNS), Urinary Tract (UT). Statistical testing was performed via two-tailed Wilcoxon rank-sum test (A, B) and two-tailed Pearson’s Chi-squared test (CF). Source data are provided as a Source Data file. Statistical significance: NS p ≥ 0.05, * p < 0.05, ** p < 0.01, *** p < 0.001, **** p < 0.0001.

Finally, we compared the frequency of metastatic dissemination to different anatomic sites between patient populations. Consistent with established disparities in overall outcomes, we observed that self-reported Black cancer patients had a greater incidence of metastatic disease compared to white patients. Notably, we found that self-reported Black patients had significantly higher rates of metastases to regional lymph nodes, distant lymph nodes, intra-abdominal space, the male and female genitourinary system, and skin (Fig. 4E, Table S5, Supplementary Data 4). Intriguingly, metastases to each of these sites except the male genitourinary system was also more common among WGD-positive tumors compared to WGD-negative tumors (Fig. 4F, Table S6, Supplementary Data 5). In total, these findings suggest that several differences in the clinical presentation of cancers in self-reported Black and white populations could be related to the increased incidence of WGDs among Black patients.

Carcinogen exposure drives WGD events in cell culture

As we found that the genetic drivers of WGD events were highly similar between self-reported Black and white patients, we next sought to investigate whether differences in environmental exposures could contribute to the increased WGD events observed in self-reported Black patients. In the United States, Black Americans are disproportionately exposed to environmental carcinogens82,83,84. This has been partially attributed to historic redlining, a discriminatory practice in which minority communities were concentrated in less desirable neighborhoods near pollution-emitting factories and highways85,86. In contemporary epidemiological studies, minority communities continue to live in disadvantaged neighborhoods with higher rates of pollution exposure and cancer mortality compared to white Americans87,88. Many carcinogens are known to accelerate the development of point mutations; however, a link between environmental pollutants and WGD events has not been reported50,51.

We established a cell culture system to investigate the link between various common carcinogens and WGDs89,90. We cultured murine lung epithelial cells alone (monoculture system) or, as a model of lung inflammation, we cultured murine lung epithelial cells along with alveolar macrophages (co-culture system) (Fig. 5A). We exposed the monoculture and co-culture systems to a selection of common carcinogens, including combustion products, carbons, clays, and various metal oxides, and performed live-cell imaging to follow mitotic progression (Fig. 5B, Table S7). We applied concentrations of carcinogens that were not overtly toxic, as the overall rates of mitotic division were largely unaffected by carcinogen exposure (Fig. 5C, Fig. S14A). In the monoculture system, 0 out of 13 tested carcinogens caused an increase in binucleation events (Fig. S14B). However, in the co-culture system, 5 out of 13 carcinogens, including 4 out of 4 combustion products, caused a significant increase in binucleation events (Fig. 5D). For instance, exposure to Printex90, a model for carbon particles found in soot as a result of the combustion of coal tar, petroleum, or other carbon-based materials, caused a 9% increase in binucleation events during a 24 hour period [t(7.38) = −4.33, p = 0.003)]91,92. In total, these results suggest that common environmental carcinogens not only drive the development of point mutations but can also promote WGDs by triggering mitotic failure and binucleation.

Fig. 5: The effects of carcinogen exposure on mitosis.
figure 5

A A schematic outlining the experiment design. B Representative live cell microscopy images illustrating examples of a normal mitosis and an abnormal mitosis leading to a binucleation event. The scale bar represents 10 µm and the scale is consistent across all images. C A box plot displaying the proliferation rate of lung epithelial cells co-cultured with alveolar macrophages after exposure to various carcinogens. The p value for the comparison between the control sample and NanoclayBent is 0.01. D A box plot displaying the proliferation rate of lung epithelial cells co-cultured with alveolar macrophages after exposure to various carcinogens. For the boxplots in (C) and (D), the center represents the median values, the whiskers define the minimum and maximum values, and bounds of the box reflect the 25th and 75th percentiles of values obtained for each compound. Statistical testing for each carcinogen exposure [DieselN1650 (n = 3), DieselN3 (n = 4), ExhaustN1 (n = 4), Printex90 (n = 4), GON059 (n = 4), MWCNT401 (n = 4), MWCNTN006 (n = 5), HNTNN (n = 4), NanoclayBent (n = 3), Fe2O3N018 (n = 3), SiO214 (n = 3), TiO2nt (n = 3), all biological replicates] was performed via pairwise two-tailed t-tests to reference control (n = 8, biological replicates). The p-value for the comparison between the control sample and DieselN1650 is 0.027, between the control sample and DieselN3 is 0.027, between the control sample and ExhaustN1 is 0.009, between the control sample and Printex90 is 0.003 and between the control sample and HNTNN is 0.038. E A volcano plot showing the mutagen class attributed proportions associated with self-reported racial group. Difference in the proportion of mutagen-attributed mutations between self-reported Black and white racial groups were tested by two-tailed Student’s t tests followed by Benjamini-Hochberg correction135. Red points indicate q-value < 0.05. F A table displaying proportions mutagen-attributed mutations by mutagen signature classes between self-reported Black and white patients. Difference in proportion of mutagen-attributed mutations between self-reported Black and white racial groups were tested by two-tailed Welch’s t-tests followed by Benjamini-Hochberg correction135. Acronyms: polycyclic aromatic hydrocarbons (PAHs), reactive oxygen species (ROS), nitric oxide species (NOS). Mutagen class names are maintained from ref. 96. such that Control refers to mutations observed in untreated controls, Other refers to mutations from mutagens that did not fit into any specific mutagen class of interest, and Unassigned refers to mutations not corresponding to any mutagen classes including Control. The bolded numbers indicate q-values below 0.05. Source data are provided as a Source Data file. Statistical significance: NS p ≥ 0.05, * p < 0.05, ** p < 0.01, *** p < 0.001, **** p < 0.0001. Figure 5A was created with BioRender.com released under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license (https://creativecommons.org/licenses/by-nc-nd/4.0/deed.en).

Signatures of carcinogen exposure in lung tumors from self-reported Black and white patients

We speculated that differences in environmental carcinogen exposure could influence the development of WGDs in Black and white cancer patients. While nationwide disparities in pollution exposure between Black and white Americans are well-documented, we do not know the exposure histories of the patients in our sequencing cohorts93,94,95. In order to investigate whether Black and white patients in our cohorts exhibited evidence of differential pollution exposure, we analyzed carcinogen-associated mutational signatures in the NSCLC tumors from TCGA. Based on prior work, we established signatures for classes of common mutagens including radiation, alkylating agents, and heterocyclic amines96. Interestingly, we found that lung tumors from self-reported Black patients exhibited a significant increase in the proportion of mutations associated with polycyclic aromatic hydrocarbon (PAH) exposure (Fig. 5E, F). PAHs are a common urban pollutant produced by burning carbon, and combustion products like Printex90 and diesel exhaust are significant sources of PAH exposure97,98,99,100. In contrast, lung tumors from self-reported white patients exhibited evidence of higher radiation and aldehyde exposure (Fig. 5E, F). While we do not know the specific exposure histories of these patients, this analysis suggests that Black cancer patients may have been exposed to different environmental pollutants than white cancer patients. Notably, we found evidence of a combustion-associated mutational signature in lung tumors from self-reported Black patients, and we found that combustion byproducts were sufficient to trigger mitotic failure in cultured lung epithelial cells.

Discussion

In this work, we found that self-reported Black or African American cancer patients exhibited a significantly greater incidence of WGD events compared to white cancer patients. This discrepancy was detectable in both a pan-cancer analysis and in several individual cancer subtypes. Historically, most research on genetic differences between cancer patient populations has focused on single-nucleotide point mutations; our work demonstrates the existence of significant, outcome-associated differences in patterns of chromosomal alterations as well30,31,32,33. We speculate that analyzing other types of genetic or epigenetic alterations in cancer (e.g., methylation patterns, smaller CNAs, intratumoral heterogeneity, etc.) may reveal additional informative differences101.

Our work is consistent with previous reports that documented an increased incidence of WGD events among African ancestry prostate cancer patients in sub-Saharan Africa102,103. However, our analysis lacked sufficient numbers to recapitulate this finding in prostate cancer specifically, as the TCGA cohort only included seven Black patients with this cancer (Fig. S13, Table S1). More broadly, while African American patients are significantly underrepresented in most genomic studies, dedicated sequencing efforts specifically designed to assess underrepresented patient populations have uncovered a wealth of new cancer drivers and vulnerabilities, illustrating the power of these focused efforts38,41,103. Notably, the cancer types in which self-reported Black patients exhibit frequent WGD events (breast, endometrial, prostate, and NSCLC) are also among those that have been recognized as exhibiting the most significant racial disparities in patient incidence or outcome13,19,20,33. Furthermore, we found that, among WGD-positive cancer patients, there were no differences in overall survival times, suggesting that WGDs may represent one mechanism underlying the disparate outcomes that have been demonstrated within the American healthcare system.

Our genetic analysis of tumor sequencing data revealed a strong association between TP53 mutations and WGD events in both self-reported Black and white patients. Consistent with previous results, we found that Black patients exhibited a higher incidence of TP53 mutations overall35,36,38. However, WGD events were still more common among self-reported Black patients with TP53-mutant cancers compared to self-reported white patients. Our work did not identify any significant differences in the genetic drivers of WGDs associated with patient race, suggesting that these events may be influenced by epigenetic or environmental factors. Furthermore, our preliminary evidence demonstrates that carcinogen exposure, particularly combustion agents, result in WGD events in vitro, thereby providing a potential mechanistic link between known social determinants of health and aggressive disease. This is particularly important because African Americans in the US are more likely to live in areas with higher rates of carcinogenic air pollution, including diesel exhaust, which could affect the development of WGDs in lung cancer25,26,104. Additional work will be required to verify the source(s) of the disparate rates of WGDs between patient populations.

The reason why carcinogen exposure triggered mitotic failure only in epithelial cells co-cultured with macrophages is at present unclear. It has previously been observed that epithelial cells are capable of tolerating foreign particulate matter, in part by upregulating lipid metabolism genes and sequestering the particles at the cell membrane. In contrast, the same particulate matter exposure results in cell death in macrophages, which causes the re-release of the particulate matter into the surrounding environment and the secretion of pro-inflammatory cytokines89. We speculate that certain paracrine signals from the macrophages may be affecting mitosis in the epithelial cells, though the identity of those signals is at present unknown.

Finally, the increase in WGD events among self-reported Black patients that we have documented has the potential to influence patient staging and treatment. We found that WGD-positive tumors were more likely to have spread to regional or distant lymph nodes, which may warrant additional surveillance and interventions among Black patients. More broadly, chromosomal alterations are strongly associated with patient outcome, and analyzing tumor karyotypes as part of a standard pathological workup may improve our ability to preemptively detect aggressive disease45,105,106,107,108,109. Additionally, recent research has demonstrated that WGD-positive cancer cells harbor unique genetic vulnerabilities. For instance, alterations in the mitotic apparatus resulting from WGD events cause cells to become dependent on the mitotic kinesin KIF18A, which is otherwise dispensable in diploid cells70,110. AMG650 is a small molecular inhibitor of KIF18A that has entered Phase I clinical trials, underscoring the recent progress toward selectively targeting WGD-positive cancers111. Therapies designed to selectively target WGD-positive tumors may be particularly effective in Black cancer patients and could serve to ameliorate the disparate racial outcomes in cancer mortality.

Methods

Data acquisition

All genomic data analyzed in this manuscript have already been published in the sources described below. Our use of de-identified and published clinical and genomic data complies with all relevant ethical regulations. Mutational, copy number, sample, and patient data were downloaded from the cBioPortal datahub (https://github.com/cBioPortal/datahub)112,113. Whole genome duplication determinations were sourced from the original cohort study (MSK-MET and PCAWG) or from subsequent analyses (TCGA)44,52,60. A sample was considered to have undergone WGD based on processing by FACETS (MSK-MET), consensus across 10 different methods (PCAWG), or ABSOLUTE (TCGA)60,114,115. Genetic ancestry determinations for the MSK-MET cohort were graciously provided by Kanika Arora, following the method from ref. 34. Racial data was determined by electronic health record review (MSK-MET) and by interview during enrollment where patients were asked to select from the racial categories defined by the U.S. Office of Management and Business and used by the U.S. Census Bureau (TCGA). Regional lymph node metastasis data were sourced from ref. 58. Mutagen signatures of environmental agents were obtained from the original study96,116. Sample size calculations were not possible as our analysis was limited by the availability of published copy number data from patient cancers.

Data harmonization and cleaning

Data was harmonized across cohorts by including lesions with known WGD status within sample data, considering only genomic aberrations for genes found in the IMPACT-505 geneset within mutational and copy number data, and maintaining patients with known sex, and self-reported race or inferred ancestry within patient data depending on availability of either self-reported race or ancestry data. In MSK-MET, patients with self-reported race of Asian-far east/Indian (Asian), Black or African American (Black), or white and inferred ancestry of either African or European were maintained separately; in TCGA, patients with self-reported race of Asian, Black, or white were maintained; in PCAWG, patients with inferred ancestry of African or European were maintained. With the exception of analyzing WGD frequency in metastatic samples, all included samples were from primary lesions with determinable WGD status, known sex, and had self-reported race of Asian, Black, or white (MSK-MET and TCGA) or inferred genetic ancestry of African or European (MSK-MET and PCAWG). Excluded samples were those from metastatic lesions, without determinable WGD status, unknown sex, or from either a racial group or genetic ancestry other than those explicitly included. Metastatic samples included for supplemental analysis included MSK-MET samples with determinable WGD status, known sex, and had self-reported race of Black or white. Metastatic samples were excluded because nearly all specimens from TCGA and PCAWG are from primary tumors, and excluding the metastatic samples allowed us to perform a more representative comparison across patient databases. Samples that lacked known WGD status or the additional demographic information listed above were excluded as samples without that information were not informative for our overall research question regarding the association between WGD status and patient race.

To be included in our analysis, genomic aberrations had to occur in ≥200 patients across racial groups and ≥2% of patients within a racial group. We focused mutational analysis on nonsynonymous mutations and trichotomized copy number levels to neutral, loss, and gain. As the cohorts we analyzed do not use the same identifiers for cancer type, we considered all cancer types regardless of identifier in pan-cancer analyses, while standardizing across cohorts for comparisons by cancer type. Standardization was as follows: breast cancer were samples listed as BRCA (TCGA) or Breast Cancer (MSK-MET); endometrial cancer were samples listed as UCEC (TCGA), Endometrioid Adenocarcinoma (MSK-MET), Serous Carcinoma (MSK-MET); non-small cell lung cancer (NSCLC) were samples listed as LUAD (TCGA), LUSC (TCGA), or Non-Small Cell Lung Cancer (MSK-MET); prostate cancer were samples listed as PRAD (TCGA) or Prostate Cancer (MSK-MET). A positive determination for regional lymph node metastasis was made according to the presence of “regional_lymph” in the “met_site_mapped” variable from ref. 58. The code used to perform this analysis is available at https://github.com/sheltzer-lab/wgd_disparities.

Software versions

Analyses were performed using R (version 4.2.2)117, PRISM (version 9.4.1), and Python (version 3.10.8)118. R packages: car (version 3.1-2)119, EnhancedVolcano (version 1.16.0)120, ggplot2 (version 3.4.2)121, gt (version 0.9.0)122, gtsummary (version 1.7.2)123, maftools (version 2.14.0)124, openxlsx (version 4.2.5.2)125, reshape (version 1.4.4)126, tidyverse (version 2.0.0)127. Python packages: lifelines (version 0.27.4)128, matplotlib (version 3.6.2)129, and pandas (version 1.5.2)130. Carcinogen analyses and life cell imaging software included Imspector (version 16.2.8282-metadata-win64-BASE) software provided by Abberior Instruments131; Fiji, ImageJ 1.52p (NIH)132; Mathematica 12.0, license L5063-5112 (Wolfram)133. Mutagen signature attributions were determined by signature.tools.lib (v2.4.4)134.

Whole-genome duplication frequency analysis of MSK-MET, TCGA, PCAWG

The frequency of WGD was calculated as the number of WGD-positive samples over the total sample count within a self-reported racial group, inferred ancestry, or self-reported sex at a pan-cancer level and additional subset by cancer type; testing for association was done via two-tailed Pearson’s Chi-squared test. WGD frequencies by cancer type were compared between MSK-MET, TCGA, and PCAWG via Pearson correlation.

Whole-genome duplication frequency in TCGA-exclusive patients

To analyze independent patients within the TCGA dataset, shared samples between TCGA and PCAWG were removed. The frequency of WGD was calculated as the number of WGD-positive samples over the total sample count (TCGA-Exclusive) within a self-reported racial group for pan-cancer and cancer type analyses; testing for association was done via two-tailed Pearson’s Chi-squared test and Fisher’s exact test across racial groups.

Whole-genome duplication frequency analysis by metastatic status and cancer stage

The frequency of WGD was calculated as the number of WGD-positive samples over the total sample count within a self-reported racial group by metastatic status (MSK-MET) and stage (TCGA) for each cancer type: testing for association was done via two-tailed Pearson’s Chi-squared test. Available staging information in TCGA was utilized which includes pathological staging for breast cancer and NSCLC and clinical staging for endometrial cancer. Metastatic status and cancer staging distributions were summarized within each racial group by cancer type; testing for association was done via two-tailed Pearson’s Chi-squared test and Fisher’s exact test across racial groups.

Whole-genome duplication frequency analysis by histological subtype

Shared cancer types between MSK-MET and TCGA were aggregated. For breast cancer, IDC includes MSK-MET histological designations HR+/HER2+ Ductal Carcinoma, HR+/HER2- Ductal Carcinoma, HR-/HER2+ Ductal Carcinoma, Ductal Triple Negative Breast Cancer (TNBC) and TCGA histological designations Ductal Luminal A, Ductal Luminal B, Ductal HER2-enriched, Ductal Basal-like, Ductal Normal-like. ILC includes MSK-MET histological designations HR+ Lobular Carcinoma and TCGA histological designations Lobular Luminal A, Lobular Luminal B, Lobular HER2-enriched, Lobular Basal-like, Lobular Normal-like. For endometrial cancer, shared cancer types between MSK-MET and TCGA included endometrioid and serous subtypes. For NSCLC, shared cancer types between MSK-MET and TCGA included adenocarcinoma and squamous cell carcinoma. The frequency of WGD was calculated as the number of WGD-positive samples over the total sample count within a self-reported racial group by share histological subtype for each cancer type; testing for association was done via two-tailed Pearson’s Chi-squared test across racial groups.

Concordance between inferred genetic ancestry and self-reported race

Inferred genetic ancestry designations (African and European) for each patient of the MSK-MET cohort followed the method described in ref. 34. and maintained separately34. Concordance between inferred genetic ancestry and self-reported race was determined as the proportions of African ancestry patients who self-identified as “Black” and European ancestry patients who self-identified as “white”.

Correlation of fractional ancestry to WGD

Fractional ancestries (AFR and EUR) for all MSK-MET patients regardless of majority inferred genetic ancestry were binned then correlated to the binned rates of WGD. Separately, AFR and EUR fractional ancestry were binned into 5%-sized bins, then within each bin, both the rate of WGD and number of patients were calculated. We then ran a logistic regression between bin number and rate of WGD, weighted by the number of patients, to test for a relationship.

Logistic regression of genomic aberrations to WGD

In MSK-MET, in order to determine which genomic aberrations correlated with increased rates of WGD, we built multivariate logistic regression models in R for self-reported Black patients only, self-reported white patients only, and all self-reported Black and white patients (i.e., all patients). All frequently occurring genomic aberrations (≥200 patients and ≥2% patients) were included in the models and were coded as binary predictor variables. Using a similar method to ref. 52. we reduced the models by removal of covariates in two stages: 1) removal of aliased covariates (i.e., those with perfect correlation to another covariate), and 2) recursive removal of the covariate with the highest variance inflation factor (VIF) until all covariates had a VIF ≤ 4 in order to remove multicollinearity. Cancer types were included in the final models. Significance of covariates were tested by Wald test followed by FDR correction via Benjamini & Hochberg’s method135.

Aneuploidy burden and chromosomal arm-level differences by self-reported race and WGD status

Total aneuploidy score, as defined as total amount of chromosome arm gains and losses, were calculated for each patient within MSK-MET and TCGA. Differences in total aneuploidy score by self-reported race and WGD status were calculated using unpaired t-tests at a pan-cancer level and within each cancer type. Chromosomal arm levels frequencies were calculated as the number of samples with chromosome arm aneuploidy (either gain or loss) over the total number of samples with determined arm status (gain, loss, or neutral) for each respective available chromosome arm within each self-report race group on a pan-cancer and cancer type level. Testing for association of specific aneuploidies was done via two-tailed Pearson’s Chi-squared test.

TP53 analysis

Nonsynonymous mutations in TP53 were analyzed across racial groups, WGD status, and cancer type. Sample containing at least one nonsynonymous mutation in TP53 were consider TP53-mutant, while the remaining samples were considered TP53-WT. The frequency of TP53 mutation was calculated as the proportion of TP53-mutant samples over the total sample count within a racial group at a pan-cancer level and additionally subset by cancer type and/or WGD status; testing for association was done via two-tailed Pearson’s Chi-squared test across self-reported racial groups. Variant classification distributions were summarized within each racial group at a pan-cancer level and additionally subset by cancer type; testing for association was done via two-tailed Pearson’s Chi-squared test across self-reported racial groups. Location of TP53 mutations were summarized within each racial group and visually compared across self-reported racial groups.

CCNE1 analysis

Gains and losses of CCNE1 were analyzed across racial groups, WGD status, and cancer types. Samples demonstrating amplifications of CCNE1 were considered CCNE1 gain. All remaining samples demonstrating either loss of CCNE1 or no alteration were considered CCNE1 neutral/loss. The frequency of CCNE gains was calculated as the proportion of CCNE1 gained sample over total sample count within a racial group at a pan-cancer level and additional subset by cancer type and/or WGD status; testing for association was done via two-tailed Pearson’s Chi-squared test.

Whole-genome duplication and tumor response and status

Clinical data including tumor response for first course of treatment and tumor status at the end of the clinical observation period were sourced from TCGA. Tumor responses included for analysis included “Complete Remission/Response”, “Partial Remission/Response”, “Stable Disease”, and “Progressive Disease,” the remaining responses were not included for analysis as these categories represented disputed, uncertain, or missing clinical responses. The frequency of progressive disease was calculated as the proportion of patients with “Progressive Disease” over total number of patients with included tumor responses by WGD status at a pan-cancer and cancer type level. For tumor status, patients with documented “TUMOR_FREE” or “WITH_TUMOR” were included for analysis, remaining patients with missing data were excluded. The frequency of patients with persistent disease at the end of TCGA’s clinical observational period was calculated as the proportion of patients “WITH TUMOR” over total number of patients with documented tumor status by WGD status at a pan-cancer and cancer type level. The definitions of these variables were provided in ref. 136. Testing for association was done via two-tailed Pearson’s Chi-squared test.

Cell culture

The murine epithelial lung tissue cell line (LA-4; cat. no. ATCC CCL-196) and the murine alveolar lung macrophage (MH-S; cat. No. CRL2019) cell line were purchased from and cultured according to American Type Culture Collection (ATCC) instructions. Cells were cultured in TPP cell culture flasks at 37 °C in a 5% CO2 humidified atmosphere until monolayers reached desired confluency. All experiments were performed with cells before the twentieth passage. For long-term live cell experiments, a stage-top incubator that maintains a humidified atmosphere with 5% CO2 and is heated to 37 °C was used. The medium used for culturing of the epithelial LA-4 cells was Ham’s F-12K medium (Gibco) supplemented with 15% FCS (ATCC), 1% P/S (Sigma), 1% NEAA (Gibco), 2 × 10−3 M L-Gln. For alveolar macrophages cell line, MH-S, RPMI 1640 (Gibco) medium supplemented with 10% FCS (ATCC), 1% P/S (Sigma), 2 × 10−3 M L-Gln, and 0.05 × 10−3 M beta mercapthoethanol (Gibco) was used. No commonly misidentified cell lines were used in this study.

Carcinogen materials for exposure assays

The environmental and engineered particulate matter used in this study are summarized in Table S7. These included four particles produced by fuel combustion (three diesel exhaust samples, one carbon black sample), three engineered carbonaceous particulate matters (graphene oxide, two multiwall carbon nanotubes), two clay samples, and four metal oxides89,137,138,139,140,141,142,143,144,145,146,147,148,149. TiO2 nanotubes were synthesized by ref. 150. Printex 90 was kindly provided by Evonik, Frankfurt, Germany. NM-401 MWCNT (MWCNTs-NM401-JRCNM04001a) were a kind gift from JRC Nanomaterial Repository. All other materials, except from commercially available DieselN1650, were obtained through the EU project nanoPASS from prof. Ulla Vogel (NRCWE, Copenhagen, Denmark).

Particulate matter preparation

Cup horn sonication was employed to disperse particulate matter in a low osmolarity, high pH buffer solution [vehicle: 1 mM bicarbonate buffer (100 times diluted bicarbonate buffer), pH of 10] to minimize charge screening of the particulate matters’ active surfaces. To ensure uniform dispersion, particulate matter was resuspended to contain 3 cm2 of the particulate matter surface/3 µL. The resulting suspensions underwent cup horn sonification in an ice bath for 15 min, utilizing 5 s on and 5 s off regimen for a total duration of 30 min, at a power setting of 20–30 W (amplitude 70) to guarantee optimal dispersion. Prior to microscopy, the volume of particulate matter dispersion containing 10 times larger of the particulate matter surface area of the cell culture well was added in a dropwise manner. The volume of particulate matter applied to cells represented 3% of the final cell media volume.

Live cell microscopy

A combination of fluorophores was used to label structures of interest in cells. We labeled murine epithelial lung tissue cells with CellTracker™ Green CMFDA (CTG, Thermo Fisher (#C2925), excitation peak 492 nm, emission peak 517 nm, 1 μM), CellMask™ Deep Red (CMDR, Thermo Fisher (#C10046), excitation peak 650 nm, emission peak 685 nm, 0.5 μg/mL), and Abberior LIVE 550 Tubulin (Abberior, (#LV550), excitation peak 551 nm, emission peak 573 nm, 100 nM). We labeled murine alveolar lung macrophages with CellTracker™ Orange CMRA dye (CTO, Thermo Fisher (#C34551), excitation peak 548 nm, emission peak 576 nm, 1 μM). It is noteworthy that CTG and CTO fluorophores were added one day prior to microscopy, while the other fluorophores were added immediately before imaging. Additionally, only CTG and CTO were washed with PBS, whereas the other labels were not. Live cell microscopy was performed using the STED microscope by Abberior Instruments in confocal mode. An inverted microscope body (Olympus IX83) was equipped with a stage top incubator (Okolab H301-MIN), which maintains atmosphere with the 37 °C, 5% CO2, and at least 95% humidity to enable long term imaging of living cells. Images were captured by 20× magnification and 0.8 numerical aperture (NA) lens. The microscope system incorporates four pulsed laser sources with a pulse duration of 120 ps and a maximum power of 50 µW at the sample plane. Four avalanche photodiode detectors are utilized for signal detection. We detected particulate matter in the label-free, backscatter detection mode, utilizing the 488/488 ± 5 nm excitation/detection.

Carcinogen exposure assays

Cells labeled with live cell compatible fluorophores were exposed to particulate matter for a total of 24 h as detailed in ref. 89. Cytotracker was added one day prior to microscopy and removed prior to microscopy. The remaining fluorophores were added immediately prior to imaging. Murine lung epithelial cells in monoculture or in coculture with murine macrophages were exposed to individual particulate matters immediately prior to imaging. For each particulate matter, 3–4 biological replicates were performed (cells with the next (+1) passage number, seeded and measured 3–4 days later), each with 1–3 technical replicates (cells with the same passage number, but seeded in a neighboring well and measured in parallel on the same day). Results were highly reproducible across biological and technical replicates. Control samples are cells labeled with all fluorophores exposed to 3 μL of vehicle buffer without particulate matter (1 mM (100 × diluted) bicarbonate buffer only), representing 3% of the total cell medium volume. 24-h time-lapse fluorescence and scattering microscopy images were quantified utilizing standard quantification algorithms of the Infinite platform (version 42), written in Python and Mathematica, to derive: 1) cell proliferation (number of cells), 2) binucleate cell formation (fraction of bi- and multinucleate epithelial cells). Rates of proliferation and binucleation changes were averaged within all the available biological and/or technical replicates. Changes in cellular responses to carcinogen exposures were tested by pairwise two-tailed t-tests to reference control. The carcinogen exposure assays were performed in a blinded manner, and the experimenter was not aware of the identity of the material or its properties during the data collection or analysis stages of the assay.

Mutagen signature attributions

In order to extract mutagen signature attributions from TCGA NSCLC samples, the “signatureFit_pipeline” of signature.tools.lib was run using the Mutagen53 catalog from Kucab, et al. with the settings: genome.v = “hg19”, randomSeed = 6206, fit_method = “Fit”, threshold_p.value = 0.05, optimisation_method = “KLD”, useBootstrap = FALSE, exposureFilterType = “fixedThreshold”, threshold_percent = 5, threshold_nmuts = 10, multiStepMode = “errorReduction”, and minErrorReductionPerc = 1596. Results were then saved via ‘plotFitResults‘ from the same package. These results were then normalized across samples by taking the number of assignments per mutagen signature in a sample divided by the total assignments for that sample. These normalized assignments were then summed within the mutagen classes as defined by ref. 96. We then compared these mutagen class assignments across self-reported racial groups using two-tailed Welch’s t-tests followed by false discovery rate correction using Benjamini-Hochberg’s method135.

Survival analysis

Survival analysis was performed in Python, using the packages: lifelines, matplotlib, and pandas. Significance was tested via logrank test.

Clinical correlates of WGD and patient race

Equivalence of ages at key clinical event times (diagnosis, death, sequencing, and surgical procedure) were compared across self-reported racial groups and WGD status via Wilcoxon rank-sum test. All ages except for diagnosis were directly reported by MSK-MET, while age at diagnosis was inferred based on survival status, whereby dead patients’ age at diagnosis was calculated as overall survival in months, while for living patients’ age at diagnosis was calculated from age at last contact minus overall survival in months. Microsatellite instability status was determined using designated Stable and Instable determinations; the frequency of microsatellite instability (MSI-H) was determined as the number of Instable samples divided by the total number of Stable and Instable samples within a self-reported racial group or WGD status; testing for association was done via two-tailed Pearson’s Chi-squared test. Staging information for MSK-MET dataset remained incomplete. The frequency of metastasis location was determined by the proportion of samples with recorded metastasis divided by the total number of samples from a self-reported racial group or WGD status; testing for association was done via two-tailed Pearson’s Chi-squared test. For those metastases occurring in only a subset of patients (e.g., Female Genital and Male Genital), only samples contained within the same subset were considered in our calculations.

Data visualization

Scientific illustrations were assembled using BioRender (Fig. 1A and Fig. 5A). Graphs and scatterplots were generated using Graphpad Prism.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.