Introduction

Breast cancer is the leading cause of cancer-related deaths among women in Brazil, with an age-standardized mortality rate of 12 deaths per 100,0001. Around 40% of breast cancer diagnoses in the country are made at advanced stages (TNM III-IV), disproportionately affecting brown and black women with lower education levels2. Mammography is the most effective and reliable technology for detecting breast cancer in its early stages3. However, there has been limited research evaluating the accuracy of mammography results derived from population data systems4,5,6,7.

Although mammography is a widely used test for breast cancer screening, the percentage of true-positives results in women aged 50 to 69 years varies between 75% and 85%. This means that approximately 15–25% of breast tumors may not be detected by the test. Factors contributing to this variability include breast density, the training of imaging technicians and the maintenance of mammography equipment8,9,10,11. False-negative results can delay treatment, while false-positive results can lead to unnecessary biopsies and high levels of anxiety among patients12,13.

Assessing accuracy by comparing mammography results with biopsy-confirmed breast cancer diagnoses is a valuable metric for evaluating the quality of a breast screening program. In Germany, sensitivity has been used as a performance indicator for this evaluation, and it has increased over the years, reaching over 80%4. In the United States, the rate of false-positive results was 12% and the rate of false-negative results was 1% among women aged 40 to 89 years undergoing screening mammography. The higher percentage of false-positive results was associated with tests performed on younger women, with high breast density, and those with a family history of breast cancer5.

By analyzing data obtained from national health information systems, it is possible to assess the quality of results in a systematic and standardized way, as well as identify areas where improvements are needed, guiding the planning of targeted interventions to improve the quality of mammography in screening programs14,15. The Brazilian National Cancer Institute is responsible for implementing national initiatives to improve mammogram quality for early breast cancer detection. These efforts include developing software for processing and managing results16. However, the accuracy of mammograms has not been extensively evaluated in Brazil.

This study aims to compare screening mammogram results with breast biopsy outcomes in the state of São Paulo. We investigated factors associated with false-positive results, considering both characteristics of the women screened and features of the reported breast lesions.

Methods

Design overview

In this cross-sectional study of registry data, we analyzed data from the Brazilian Breast Cancer Information System (SISMAMA) database, specifically using data from 2012 to 2013. These years mark the final period of a unified national database, providing essential reference points for research. The dataset comprises records of mammograms conducted at radiology facilities and the biopsies conducted in anatomopathological laboratories within the Unified Health System (SUS) in the state of São Paulo.

Setting and participants

The analysis encompasses data for 1,167,035 women who underwent routine mammography screening in the state of São Paulo from January 1, 2012 to December 31, 2012. These mammography exams were linked to 29,332 biopsies conducted between January 1, 2012, and December 31, 2013. The linkage process entailed integrating information from women undergoing mammography screening, such as names, mothers’ names, and birth dates. In 654 cases where the mothers’ names were unavailable, we used names and dates of birth as matching criteria. The overall connection performance, reflecting the merging of the data matrix, achieved 98.6%. Python version 3.7 was employed for the linkage procedures17. From the 13,098 results obtained through linkage, duplicated data were excluded, taking into account the BI-RADS classification. For each woman, the record with the highest BI-RADS value was retained. Additionally, 92 males were excluded, resulting in 10,501 mammography-biopsy linkage results available for analysis (Fig. 1).

Outcome measures and factors influencing mammography results

To assess agreement between mammography and biopsy findings, the results were categorized into true-positives and false-positives, based on criteria established in prior studies5,12,13,18. BI-RADS 4 (suspicious) and BI-RADS 5 (highly suspicious) results followed by positive biopsies were categorized as true-positives, while BI-RADS 4 and BI-RADS 5 results followed by negative biopsies were considered false-positives. Conversely, BI-RADS 2 (benign) and BI-RADS 3 (probably benign) results followed by negative biopsies were considered true-negatives, while BI-RADS 2 and BI-RADS 3 results followed by positive biopsies were classified as false-negatives. None of the screening mammograms that remained for analysis was classified as BI-RADS 1. The 393 mammograms classified as BI-RADS 0, regardless of the biopsy result, were excluded from the analysis as they could not be categorized as true- or false-positives (Fig. 1).

We examined factors related to women screened for breast cancer and factors related to breast lesion in order to assess the probability of false-positive results in screening mammograms. Factors related to the women screened were age (< 50, 50–59, 60–69, or 70 + years), skin color (white, black, brown, or yellow), education (illiterate, incomplete primary education, complete primary education, complete secondary education or graduated), family history of cancer (yes or no), breast skin type (retracted, thickened or normal), breast density (fatty/predominant fatty or dense/predominant dense), use of hormonal therapy (yes or no), previous radiotherapy (yes or no).

Breast lesion-related factors include type (solid nodules, cysts, calcifications, or solid cysts), edge (defined or undefined), characteristics (regular or irregular), size (≤ 10 mm, 11–20 mm, or > 20 mm), and topographical site (upper lateral quadrant, lower lateral quadrant, upper medial quadrant, lower medial quadrant, union of lateral quadrants, union of medial quadrants, union of upper quadrants, union of lower quadrants, retroareolar region or axillary extension).

Furthermore, we assessed the time to diagnosis, indicated by the number of days between the date of mammogram result and the release of the biopsy result.

Statistical analysis

We calculated the proportion of true-positive and false-positive results, comparing them across different strata or values of variables using the Chi-Square test or the T-test. Logistic regression analysis was used to estimate the odds ratio of false-positives results according to predicted factors, with true-positive results as the reference. The decision to retain variables in the logistic regression models was based on clinical criteria and statistical significance. Two models were designed: Model 1 included only factors related to the women screened (age, family history of cancer, breast skin type, breast density, hormonal therapy, and previous radiotherapy); Model 2 included only factors related to breast lesions (type, edges, characteristics and size). These models included variables with Chi-Square test significant values. The backward conditional approach was applied to observe the statistical significance of variables included in both models. Those that did not fit into the models were removed. Race and education were excluded from Model 1, while the topographical site of breast lesions was eliminated from Model 2. Regarding the breast lesion type, solid nodules category was classified as the reference category in our study, as this type of lesion is more common in true-positives results19,20,21,22. The effect of false-positive screening mammograms on time to diagnosis was also examined, adjusting for age and lesion size. Odds ratio (OR) and corresponding 95% confidence intervals (95% CI) were estimated to assess the association between predicted variables and false-positives mammogram results. Analyses were performed using the SPSS, version 29, software.

Ethical issues

This study was approved by the Human Research Ethics Committee at the School of Public Health of the University of São Paulo (CAAE No. 68509523.8.0000.5421) and all relevant ethical regulations were followed. Informed consent was obtained from all subjects.

Results

Among the 10,501 linkage results from screening mammograms and biopsies, 4,363 (41.5%) were true-positives, 5,544 (52.8%) were false-positives, 198 (1.9%) were true-negatives, 3 (0.03%) were false-negatives. A total of 393 linkage results (3.8%) were excluded from the analysis because they did not fit into any of these categories (Fig. 1). Additionally, we observed that 5,017 (37%) women with BI-RADS 4 mammography results and 322 (20%) women with BI-RADS 5 results did not undergo breast biopsies or were lost to follow-up (data not shown).

Fig. 1
figure 1

Number of screening mammograms conducted in 2012 and biopsies conducted in 2012 and 2013* in the state of São Paulo, along with linkage of mammograms and biopsies, and BI-RADS classification. *Data obtained from the Brazilian Breast Cancer Information System (SISMAMA).

In true-positive results, a higher proportion of women were aged 50 to 59 years. Conversely, in the false-positive results, more than half of the women were under 50 years. The majority of women with true- or false-positive mammography results identified themselves as white and had completed primary education. However, women with incomplete primary school were more prevalent in the true-positive results, while women with complete secondary education were more frequent among false-positive results. Women with true-positive results more frequently reported a history of familial cancer and previous radiotherapy compared to those with false-positive results. On the other hand, the use of hormone therapy, thickened breast skin, and dense breasts were more common among women with false-positive results compared to those with true-positive results (Table 1).

Table 1 Women factors associated with false-positive results in screening mammogram.

Calcifications, lesions with defined edges, and lesions smaller than 10 mm were more common in false-positive results. In contrast, solid nodules, lesions with undefined edges, and lesions ranging from 11 to 20 mm in size were more common in true-positive results. The upper lateral quadrant was the predominant topographical site for breast lesions in both true- and false-positive results (Table 2). Figure 2 shows that women with false-positive screening results experienced a statistically significant longer time until diagnosis (160 days; SD: 20) compared to those with true-positive results (135 days; SD: 30).

Table 2 Lesion factors associated with false-positive results in screening mammogram.
Fig. 2
figure 2

Diagnosis time (days) according to false-positives and true-positives results in screening mammograms, evaluated by the T-test. Boxes represent the interquartile range, the line inside it represents the mean, and the bottom and top lines of the box are the first and the third quartiles, respectively. Whiskers limits are the lowest and the highest observation within 1.5 of the interquartile range from the lower and upper quartiles. Circles represent outliers. * p < 0.05 for comparing the difference on diagnosis time between groups.

Tables 1 and 2 present results of multivariate analysis for factors related to screened women and factors related to breast lesions, respectively, comparing false-positive to true-positive results. Regarding women-related factors, ages under 50 years (OR: 2.9; 95% CI 1.2–6.9), use of hormone therapy (OR: 1.4; 95% CI 1.1–2.3), and presence of dense breasts (OR: 1.6; 95% CI 1.5–1.9) were predictors of false-positive results mammograms. No statistically significant association was observed for breast skin type and previous radiotherapy (Table 1). Concerning the breast lesion-related factors, the presence of calcifications (OR: 3.4; 95% CI 2.9–4.0), defined lesion edge (OR: 2.4; 95% CI 1.9–3.2), as well as lesion sizes smaller than 10 mm (OR: 3.8; 95% CI 3.4–4.3) were predictors of false-positive results. No statistically significant association was observed for lesion characteristics (Table 2). False-positive screening mammography results are twice as likely to be associated with a long time to diagnosis (> 150 days, OR = 2.0; 95% CI: 1.6–2.5; data not shown).

Discussion

The percentage of false-positive mammography results observed in our study in São Paulo (52.8%) was significantly higher than the rates reported in studies from high-income countries such as the United States and Germany, which ranged from 12 to 20%4,5. These discrepancies may be partly attributed by differing criteria used to define false-positive results. For instance, the study in the United States focused solely on false-positive rates from digital mammography screening, which is more accurate in women under the age of 50, women with dense breasts, and premenopausal or perimenopausal women23. In São Paulo, the widespread use of digital mammography has only recently become common. Nevertheless, the high rate of false-positives observed in our study indicates a significant barrier to the effectiveness of the breast screening program in the state. It is therefore crucial to understand the factors contributing to false-positive mammography results and to develop strategies to address this issue.

In this study, we investigated the factors contributing to false-positive results in mammography screening, recognizing that evaluating these factors is essential for assessing the extent of breast cancer overdiagnosis in a given population24,25. Understanding these causes can help refine screening protocols, improve diagnostic accuracy, reduce anxiety among women undergoing screening, and minimize unnecessary follow-ups.

This was the first study conducted in Brazil using population data to associate false-positive mammogram results with both screened women-related factors and breast lesion-related factors. Our results indicate that women under 50 years, those using hormone therapy, and those with dense breasts, as well as lesions with defined edges smaller than 10 mm and calcifications, were the main factors affecting mammography accuracy.

False-positive mammogram results have been associated with age, hormone therapy, familial history of cancer, and breast density in previous studies5,24,25. The recent recommendation by the US Preventive Services Task Force to lower the starting age for mammographic screening from 50 to 40 years is under debate26as approximately two-thirds of women under 50 with breast cancer are classified as non-“high risk”27. Screening guidelines vary globally: the starting age is 40 years in Japan, 45 years in China, and 50 years in Canada, Malaysia, and Germany. However, the appropriate starting age for breast cancer screening remains a topic of controversy in several countries, including other European nations, Singapore, Australia, the United States, and Brazil28. The Brazilian Ministry of Health recommends biennial screening mammograms for women aged 50 to 6916, while the Brazilian Society of Mastology advocates extending mammographic screening to women aged 40 to 4929.

As our study found a higher proportion of false-positive results among women under 50, lowering the starting age for mammographic screening could reduce accuracy and increase overdiagnosis rates, potentially adversely affecting women’s emotional health30,31,32. However, Black, Hispanic, Asian, Native American women are more likely to develop invasive breast cancers and experience higher breast cancer-related mortality at younger ages compared to White women33. In this context, incorporating ethnicity or skin color into screening recommendations for women under 50 could help address the higher breast cancer mortality rates observed in certain ethnic groups. For instance, initiating biennial screening for Black women at age 40 has been shown to reduce disparities in breast cancer mortality, with benefit-harm ratios comparable to those of biennial screening for White women aged 50 to 7434.

Reduced mammography accuracy has also been found in women over 50 years of age who are undergoing hormone therapy, as they tend to have a higher proportion of dense breast tissue compared to non-users35,36,37. A meta-analysis suggested that age may influence the association between hormone therapy use and mammography sensitivity and specificity38. Our data showed an increase in false-positives among women using hormones, regardless of age. Additionally, our results revealed a higher percentage of false-positives among women with dense or predominantly dense breasts. However, more studies are needed to adequately evaluate the impact of the type, duration, and doses of hormonal therapy on false-positive mammogram results and their relationship with the woman’s age.

Familial history of cancer is a variable to be considered in the context of an organized breast cancer screening program, although the hereditary factor in the causality of breast cancer accounts for less than 10%39. Nelson et al.5 suggested that false-positive mammography results are relatively common in younger women with a history of cancer in their family. In our study, we did not find an association between false-positive mammograms results and familial history of cancer. In fact, an opposite trend was observed.

The detection of breast cancer through mammographic examination depends on aspects related to the lesions. Some characteristics of breast lesions can be easily identified, facilitating diagnosis. On the other hand, other features are subtle, making the diagnosis more challenging. Moreover, some lesions are visible on mammograms, although the images do not match the pathological diagnoses19,20,21,22. In our study, we found an association between false-positive results and the presence of calcifications in women undergoing to screening mammograms. Microcalcifications have been associated with benign lesions, while large solid lesions are more often associated with malignant ones22. Cole et al.21 found a higher sensitivity for the interpretation of solid masses and a lower sensitivity for the interpretation of calcifications using three image-processing algorithms.

The topographic location of the breast lesion can be a useful factor in evaluating mammography accuracy, as tissue characteristics differ in the upper, lower, medial, or lateral areas of the breast. For example, glandular tissue is concentrated in the upper and outer areas, while the proportion of adipose tissue varies in different areas. This density distribution can affect how abnormalities are detected by mammographic exams40,41. We did not find significant association between different breast areas and mammography sensitivity. However, it is important to consider that mammography sensitivity can be influenced by the patient’s menstrual cycle during the mammogram and the training of professionals responsible for conducting the examination and evaluating the images42,43.

We found that women with true-positive mammogram results experienced an average diagnosis period four times longer than the maximum of 30 days established by Brazilian legislation. For women with false-positive mammogram results, the time to diagnosis is even longer. Experiencing a false-positive mammogram actually increases the risk of late-stage diagnosis, as these women are more likely to delay subsequent mammograms44,45. Another concern in the state of São Paulo is the high percentage of women with suspicious (BI-RADS 4) or highly suspicious (BI-RADS 5) mammograms who did not undergo biopsies or were lost to follow-up. This highlights the need for an effective information system to monitor women screened for breast cancer within the context of an organized breast cancer screening program.

To reduce the number of false-positive and inconclusive mammograms results, it is essential to provide continuous training for healthcare professionals on proper screening procedures. This is a critical prerequisite for implementing an organized, population-based screening program45. Additionally, double-checking mammography images and ensuring proper maintenance of mammography equipment can further minimize false-positive outcomes.

Some limitations can be pointed out in our study, including the lack of data related to body mass index and the absence of information on the type and duration of hormonal therapy in the SISMAMA database, which could potentially affect the sensitivity of mammography. Furthermore, there is a notably high amount of missing data for skin color, education, lesion topographical site, and lesion edges. However, this did not impede the analysis of these variables.

Analyzing big data in health is essential for evaluating and implementing clinical interventions, as well as for establishing public health policies. Health information systems can provide valuable and reliable data on factors associated with mammogram sensitivity. This information is important for planning and improving breast cancer screening programs by reducing uncertainty in results and shortening diagnostic times.

Conclusion

This study identified key factors associated with false-positive mammography results in the breast cancer screening program of São Paulo state. These factors include being under 50 years of age, use of hormone therapy, having breast lesions with defined borders smaller than 10 millimeters, and the presence of calcifications. Additionally, women with false-positive results experienced a longer time to diagnosis.