Introduction

The rising prevalence of diabetes presents a significant global health challenge, with projections indicating that by 2030, approximately 10.2% of the world’s population will be living with the disease1. Type 2 diabetes mellitus (T2DM) accounts for over 95% of all diabetes cases and is associated with a range of complications and co-morbidities2 and has been identified as a risk factor for breast cancer3,4. Women with pre-existing diabetes may be at a higher risk for late-stage tumours, larger tumour size, and lymph node invasiveness compared to those without a prior diabetes diagnosis5. However, the factors that could affect survival among women with T2DM and breast cancer are complex and not fully understood, given the varying quality of evidence and study limitations5. In vitro studies have shown hyperglycaemia and hyperinsulinemia can increase breast cancer cell proliferation, invasiveness, and migration6. Moreover, a study found high levels of glucose enhanced migration in oestrogen receptor-positive breast cancer cells, suggesting that the hormone receptor pathways may be involved in mediating the response of breast cancer cells to metabolic changes in the microenvironment7.

Breast cancer is the most common malignancy among women globally, and the incidence is increasing8,9. In Scotland, one in eight women is diagnosed with breast cancer during their lifetime10. Breast cancer can be classified into one of four intrinsic molecular subtypes which are based on gene expression, and each contains distinct molecular characteristics, providing information on prognosis and the optimal course of treatment11,12. The St. Gallen Expert Panel recommends classifying the intrinsic molecular subtypes using immunohistochemistry (IHC) markers as surrogates: oestrogen receptor (ER), progesterone receptor (PR), human epidermal growth factor receptor 2 (HER2), and the proliferation marker Ki-67 or tumour grade13. Based on St. Gallen’s, five molecular subtypes can be defined: luminal A-like, luminal B-like (HER2 −), luminal B-like (HER2 +), HER2-enriched, and triple-negative breast cancer (TNBC)13. Luminal-like cancers are the most common in European ancestry populations indicating responsiveness to hormone therapy14. TNBC is an aggressive subtype shown in many to have poor prognosis with limited targeted therapies15.

Scotland has been collecting high-quality electronic health records for several decades. The Scottish Care Information (SCI)-Diabetes dataset has had nationwide coverage since 2006. The dataset is estimated to capture over 99% of people with a diabetes diagnosis in Scotland16. SCI-diabetes is updated daily, and all general practices and hospital clinics in the country contribute data to comprise a cohort with over 4 million person-years of follow-up16. In 2020, 88.1% of individuals had a T2DM diagnosis16. The Scottish Cancer Registry (SMR06) is a national electronic database that has existed since 1981 with breast cancer completeness of 99%17,18. ER data collection began in 1997, and PR and HER2 data collection began in 200918. SMR06 contains data on breast cancer detection mode sourced from Scotland’s national mammography programme. Women aged 50 to 69 are invited for mammography every three years19. Another key prognostic factor included in SMR06 is the Scottish Index of Multiple Deprivation (SIMD) https://www.gov.scot/collections/scottish-index-of-multiple-deprivation-2020/, a nationwide area-based measurement of socioeconomic status (SES). SIMD was established by the Scottish Government and is based on measures of income, employment, health, education, crime, access to services, and housing. People registered with a Scottish general practitioner are assigned a unique Community Health Index (CHI) number which is an identifier on all public health system records in the country that allows data linkage. Pseudonymised extracts of SCI-diabetes are linked to SMR06 in the Scottish Diabetes Research Network (SDRN) national dataset (NDS) 16.

This project aimed to investigate the relationship between distinct molecular subtypes and survival among women with T2DM and breast cancer in Scotland using a retrospective cohort study design.

Methods

Study cohort

The cohort consisted of women diagnosed with T2DM before diagnosis of a primary invasive breast cancer (International Classification of Diseases (ICD), 10th revision, code C50). T2DM diagnosis was obtained from linked primary and secondary care records16.

All women with T2DM aged 50 to 84 years of age with breast cancer diagnoses between 2010 and 2019 were identified. Since the SMR06 began recording PR and HER2 in 2009, only women diagnosed after 2010 were included allowing one year for the collection of these additional variables. The 2019 cut-off was established to avoid the effects of the Covid-19 pandemic which caused delays in diagnoses and accurate data collection. The cohort age restriction (50 to 84 years) was chosen to include women who are of breast cancer screening age (50 to 69 years) and to minimize the amount of missing data, which was more common in women over the age of 84 years (data not shown).

Women with other primary malignant cancers prior to breast cancer diagnosis were excluded from the analysis. Individuals were also removed if they had a matching date of breast cancer incidence and date of death, or a breast cancer diagnosis only from their death certificate. Moreover, women with missing data for the following variables were excluded: age, T2DM duration prior to breast cancer diagnosis, death censoring and cause, Scottish region, breast cancer diagnosis year, ER and/or PR status, HER2 status, grade (only for women with luminal cancers), mode of detection, Scottish Index of Multiple Deprivation (SIMD), treatment information (surgery, chemotherapy, radiotherapy, and hormone therapy) (Fig. 1).

Fig. 1
figure 1

Flowchart of the complete case cohort. The cohort consisted of women with T2DM aged 50 to 84 years with breast cancer diagnosis between 2010 and 2019 in Scotland.

Definitions

Breast cancer

Breast cancer receptor status was assigned through IHC staining and molecular subtypes were classified based on St. Gallen criteria, utilizing grade (I—well-differentiated to III—poorly differentiated) as a proxy for Ki-67 in luminal tumours because Ki-67 is not used routinely in Scotland13. The surrogate markers ER + and/or PR + /HER2- (grade I/II) characterized Luminal A tumours, ER + and/or PR + /HER2- (grade III) for Luminal B (HER2-) tumours, ER + and/or PR + /HER2 + for Luminal B (HER2 +) tumours, ER-/PR-/HER2 + for HER2-enriched tumours, and ER-/PR-/HER2- for TNBC tumours.

Type 2 diabetes mellitus

T2DM status was assigned based on cleaned data regarding diabetes type in the research dataset. T2DM duration at breast cancer diagnosis was calculated as the time difference between the date of the earliest mention of T2DM in the SCI-diabetes research dataset and the diagnosis date of breast cancer in SMR0616.

Mode of detection

The mode of detection was categorized into screen-detected or non-screen-detected19.

Deprivation

For this analysis, the 2016 version was used and SIMD was expressed in quintiles ranging from the most deprived (quintile 1) to the least deprived (quintile 5)20.

Scottish region

Regions were categorized based on the individual’s residence at the time of breast cancer diagnosis, using Scotland’s 14 National Health Service (NHS) boards which were grouped into three regions (West, North, and South-east) for analysis. The West region comprised Ayrshire and Arran, Forth Valley, Clyde and Lanarkshire, and Greater Glasgow. The North comprised the Western Isles, Grampian, Highland, Orkney, Shetland, and Tayside. South-east comprised Borders, Dumfries and Galloway, Fife, and Lothian21.

Survival

Breast cancer deaths were identified from the primary cause of death in death records. The complete follow-up duration was defined as the time from breast cancer diagnosis (recorded in SMR06 as the first consultation or hospital admission for breast cancer) to the time of death, or until November 30th, 202122. Follow-up was censored at the date of other causes of death Non-parametric Kaplan–Meier estimates were used to describe breast cancer survival over a ten- year period by molecular subtype for two age categories: breast cancer screening-eligible age (50 to 69 years) and not routinely eligible for screening age (70 to 84 years).

Cox proportional hazard models were fitted for univariate and multivariable analyses to assess breast cancer survival and various prognostic factors over a maximum of three years of follow-up to maximise the size of the dataset and reduce proportional hazard violations that occurred at five years. Models were adjusted for the following covariates: molecular subtype, age at breast cancer diagnosis, year of breast cancer diagnosis, Scottish region, mode of detection, treatment (binary variables for surgery, chemotherapy, radiotherapy, and hormone therapy), SIMD quintile, and T2DM duration. To assess the proportionality, Schoenfeld’s tests were conducted for each variable. All analyses were carried out using R 4.623.

Multiple imputation was performed as sensitivity analysis and compared to the results from complete case models. Missing covariate data for ER status, PR status, HER2 status, grade, SIMD, mode of detection and treatments were imputed by chained equation models using a model compatible with the analysis model24. The outcome used was the Nelson-Aalen estimator for time to death and a censoring indicator as described by White and Royston25. Of the initial 3635 cases, 16% had a missing value for at least one variable so 30 imputations were used to give reliable estimates24. Each imputed dataset was analysed and the Rubin’s rule was used to combine the coefficients from the models26.

Results

Characteristics of cohort by vital status

Between 2010 and 2019, 3635 women between the ages of 50 and 84 with T2DM were diagnosed with breast cancer in Scotland (Fig. 1). Of these women, 3042 (84%) had no missing data (Fig. 1, Table 1). Over half of the women had luminal A cancer (58%), followed by luminal B (HER2-, 19.2%), TNBC (9.9%), luminal B (HER2 + , 8.5%), and HER2-enriched (4.4%). The group with HER2-enriched tumours had the highest proportion of breast cancer deaths (34.1%) over the follow-up period (median follow-up [IQR] = 4.59 years [2.77, 7.28]) and the group with luminal A tumours had the lowest proportion of breast cancer deaths during this period (8.3%) (Table 1). Higher proportions of women 70 to 84 years of age had breast cancer deaths (17.7%) compared to younger women (10.2%). A third of women (33.7%) had a screen-detected breast cancer and of those women, only 5.4% had breast cancer deaths, whereas women with non-screen detected diagnoses had a much higher proportion of breast cancer deaths (18.1%). Among treatment categories, women who received chemotherapy had the highest proportion of breast cancer deaths (20.1%), followed by radiotherapy (11.0%), hormone therapy (11.0%), and surgery (8.5%). There was no distinct pattern in the proportions of breast cancer deaths and deprivation, Scottish region, or T2DM duration. In contrast, differences in proportions of breast cancer deaths were observed between different molecular subtypes, age groups, year at breast cancer diagnosis, mode of detection, and treatments (Table 1).

Table 1 Descriptive characteristics of women with complete covariate data diagnosed with breast cancer and type 2 diabetes.

Among the eligible cohort of 3635 women, proportions of missingness in variables were highest for PR status (17.99%), tumour grade (11.17%), and HER2 status (6.82%). All other variables had less than 2.5% missingness (Table 2). The distribution of descriptive characteristics was similar between the complete case and imputed cohorts (Table 3) with a slightly higher proportion of luminal A tumours and lower proportion of other tumours in the imputed cohort.

Table 2 Missingness table of eligible cohort.
Table 3 Descriptive characteristics of eligible (complete case) and imputed cohort.

Survival

Women with luminal A breast cancer had the best breast cancer survival over ten years in both age categories (Fig. 2). Women with HER2-enriched tumours had the worst survival for those diagnosed between the ages of 50 to 69, whereas women with TNBC tumours had the worst survival for those diagnosed between the ages of 70 and 84 (Fig. 2, Table S1).

Fig. 2
figure 2

Kaplan–Meier curves demonstrating breast cancer specific survival by molecular subtype. Breast cancer specific survival of women with type 2 diabetes and diagnosed with breast cancer in Scotland from 2010 to 2019, aged 50 to 69 (A) and 70 to 84 (B) (95% confidence intervals are represented by shading).

In both the minimally adjusted and adjusted complete case Cox models that are described in Table 4, breast cancer mortality over a 3-year follow-up period was higher among women with other tumour types compared to those with luminal A tumours. Among the other molecular subtypes, the lowest excess mortality occurred among women with luminal B (HER2-) tumours, followed by those with luminal B (HER2 +) tumours, HER2-enriched tumours, and TNBC tumours.

Table 4 Minimally adjusted and adjusted Cox models of breast cancer mortality in complete case and imputed cohort.

In the adjusted model, breast cancer mortality was higher for women who had non-screen-detected tumours compared to those who had screen-detected tumours (HR = 2.52, 95% CI: 1.62 to 3.92, Table 4), for women who did not undergo surgery compared to those who did undergo surgery (HR = 14.02, 95% CI: 10.52 to 18.69), and for women who did not have hormone therapy compared to those that did have hormone therapy (HR = 4.15, 95% CI: 3.16 to 5.44). Breast cancer mortality was notably lower among those who did not receive chemotherapy compared to those who did receive chemotherapy (HR = 0.71, 95% CI: 0.51 to 0.97), and women who did not undergo radiotherapy compared to those who did undergo radiotherapy (HR = 0.71, 95% CI: 0.53 to 0.93). The adjusted model indicated no associations between breast cancer mortality and age at breast cancer diagnosis, year of breast cancer diagnosis, SIMD quintiles, Scottish region, or TD2M duration. Notably, the strongest predictors of survival were the mode of detection and treatments (Table 4).

Compared to the complete case cohort, the imputed cohort showed lower HRs for breast cancer specific mortality across molecular subtypes in both the minimally adjusted and adjusted models (Table 4). The minimally adjusted model followed the same pattern of excess mortality by molecular subtype as the complete case cohort, however, the fully adjusted model had attenuated HRs for all subtypes when compared to the estimates from the complete case model. The difference in HRs was greatest amongst HER2-enriched tumours (HR was 1.71 for imputed cohort compared to 4.68 in the complete case cohort) and TNBC (HR was 2.74 in the imputed cohort compared to 4.70 in the complete case cohort) subtypes.

Discussion

As expected, among women with T2DM, mortality was lowest for those diagnosed with luminal A tumours. These findings are consistent with prior research from the general population with breast cancer in Scotland27. In the general Scottish breast cancer population aged 50 to 69, the lowest crude 5-year breast cancer specific survival (95% CI), regardless of diabetes status, was observed among women with TNBC tumours at 78.6 compared to 81.7 for HER2-enriched tumours and 95.5% for luminal A tumours27. In contrast, among women with T2DM and breast cancer diagnosed between the ages of 50 and 69 years, the lowest crude 3-year breast cancer specific survival was among those with HER2-enriched tumours was about 8% lower compared to 78.2 for TNBC tumours and 93.8 for luminal A tumours. Results from our adjusted Cox models after imputation for women with T2DM and breast cancer death relative to luminal A tumours showed HER2-enriched tumours almost 2X more likely to die and 3X more likely to die if they had TNBC tumours. Our results show the importance of performing sensitivity analysis and imputation of tumour characteristics in special populations such as women with diabetes.

Due to the aggressive nature of HER2-enriched tumours, improvement in prognosis is reliant on chemotherapy28. However, patients with diabetes often experience heightened chemotoxicity levels, potentially influencing survival outcomes29. Moreover, research has shown that diabetes has a negative effect in postoperative cases of HER2-positive breast cancer patients treated with trastuzumab30. While we observe higher mortality for HER2-enriched tumours in complete case analysis these estimates were attenuated upon imputation for tumour characteristics suggesting current treatment may be sufficient, however, there is some suggestion that other treatment approaches may be warranted to improve outcomes in women with diabetes and breast cancer that should be pursued in future research30.

Adjusted Cox models indicated that women with screen-detected tumours have lower mortality. This aligns with similar observations in the United States, where adjusted Cox models, using a 5-year follow-up period, demonstrated higher mortality among women with symptom-detected breast cancer compared to those with screen-detected breast cancer31. Previous research conducted on deprivation in Scotland found higher breast cancer mortality among the most deprived individuals, whereas among the T2DM cohort, no differences were observed19 potentially owing to this cohort undergoing continued surveillance for their diabetes condition reducing mortality not only from diabetes but for breast cancer as well.

Our results indicated that women who received surgery or hormone therapy had lower mortality when compared to those who did not receive the treatment, however, those who received chemotherapy or radiotherapy had higher mortality than those who did not receive the treatment. Breast cancer is often treated with a combination of therapies which depends on the molecular subtype. Women receiving hormone therapy and surgery are more likely to have early-stage cancer, a luminal subtype, or have better health14,32. Women with an aggressive subtype are more likely to have chemotherapy or radiotherapy treatments15,32. As previously mentioned, understanding the effect of type of treatment on mortality in observational studies is complex and requires further investigation. The use of the PREDICT breast cancer prognosis prediction tool would be highly beneficial in further analyses as it provides personalized prognostic information of 5-year, and 10-year survival estimates with and without treatments. Further research could validate the PREDICT tool for the T2DM population in larger datasets33.

While this study focused on prognostic factors for breast cancer specific survival in women with T2DM and breast cancer, it is crucial to acknowledge the competing risks of death from other causes. For example, individuals with T2DM face an elevated risk of cardiovascular disease (CVD), which has important effects on overall health and mortality and may act as a competing risk for breast cancer specific mortality34.

To our knowledge, this study is the first population-based investigation of prognostic factors for breast cancer specific survival in women with T2DM and breast cancer conducted in Scotland and the United Kingdom. A key strength of the investigation is the high-quality and nationwide data in SCI-diabetes with close to 100% completeness16,17. Moreover, these Scottish health datasets contain essential prognostic factors such as area-based deprivation (SIMD) and mode of detection that are not commonly available in many other nationwide databases.

Although this study was able to use high-quality data, the SMR06 registry posed a limitation due to incomplete molecular subtype information. The missingness is likely non-random, with women excluded from the cohort more likely to have poorer survival outcomes35. To address this, we conducted a sensitivity analysis using multiple imputation by chained equations, which highlights the importance of accounting for missing data. In our imputed models, HRs for all subtypes were attenuated, particularly for the most aggressive subtypes, such as TNBC and HER2-enriched. This suggests that the complete case analysis might have overestimated the relative risk for those subtypes. One possible explanation is that complete case analysis omitted a greater proportion of women with less aggressive subtypes, who were less likely to be treated and are generally older36. This potential selection bias by factors that affect breast-cancer specific mortality might have resulted in higher HRs in the complete case analysis due to the differences between the reference group (luminal A) which are generally healthier women and those with more aggressive subtypes36. Further, multiple imputation allowed for better adjustment of covariates and resulted in more precise estimates, which is particularly important in the case of small subgroups, such as TNBC and HER2-enriched subtypes.

As for any descriptive study, this study has the potential for bias and confounding. When investigating survival, these analyses are prone to lead time and length biases. Due to the breast cancer screening programme in Scotland, women had a better chance of survival than if no screening programme existed37. The average breast cancer screening uptake in Scotland from 2011 to 2021 was 72%38. Women with diabetes have been reported to have lower screening uptake than women without diabetes in some populations, which would be worth exploring further when linkage with breast screening data becomes available in Scotland39,40. Moreover, incorporating data on glycaemic control could help explore the potential effect on breast cancer outcomes16. Multivariable analyses were conducted to minimize the effects of confounding from known and available covariates. Although many important covariates such as treatments and deprivation were adjusted for, there were confounders not included that could have affected the results. For example, comorbidities, T2DM treatments, TNM stage, lifestyle factors (smoking status, physical activity, diet, alcohol consumption), and reproductive factors (age at menarche, number of births given, menopausal status) could be associated with subtype and survival. The 3-year follow-up, selected for comparison with the complete breast cancer population in Scotland27, is a limitation of the study. Longer follow-up could offer additional insights, and as follow-up data accumulate, it would be worthwhile to repeat the analysis.

In conclusion, this study highlights the effects of various prognostic factors on mortality in a cohort of women with T2DM and breast cancer. Prognostic factors such as molecular subtype and mode of detection of breast cancer mortality were identified, along with complex associations between treatments. The molecular subtype distribution and survival trends of women with T2DM and breast cancer were similar to the general breast cancer population in Scotland. However, our findings suggest limited potential differences between women in the general population and women with T2DM with breast cancer diagnosed 50–69 years of age, in rankings for breast cancer specific survival for TNBC and HER2-enriched subtypes. Given these differences and the complexity of treatment interactions in this population, there is scope for further investigation into breast cancer survival by subtype among women with T2DM in other larger populations including direct comparisons to survival among women without T2DM.