Introduction

The Homologous Recombination (HR) Repair pathway is crucial in repairing double-stranded DNA breaks, and several proteins, e.g., BRCA1, BRCA2, and RAD51, are central to the process1,2. When pathway components are dysregulated or dysfunctional, it leads to HR Deficiency (HRD), which is associated with genomic instability and risk of cancer development3,4. Despite not yet being part of routine clinical practice when treating breast cancer, determining the HRD-status of a tumor is an important predictive marker, as patients with a high level of HRD have been shown to respond better to treatments such as poly (ADP-ribose) polymerase (PARP) inhibitors and platinum-based chemotherapies than patients with a functional HR Repair Pathway in breast5,6,7,8,9 and ovarian cancers5,10,11. Previously, it has been shown that the prevalence of HRD varies among breast cancer tumors according to their PAM50 subtype, with 46–62% of triple-negative breast cancer cases displaying HRD, compared to 2–23% for non-triple-negative breast cancer12,13. This highlights HRD as a clinically relevant biomarker for predicting treatment response in breast cancer.

Several tools have been developed to assess HRD-status using various HRD definitions and sequencing data types, such as panel sequencing and whole genome sequencing (WGS)14. This study compares two tools primed for HRD-status assessment using WGS data, HRDetect15 and ShallowHRD16, both have previously been trained on breast cancer cohorts15,16. HRDetect has been tested on tumor cohorts from breast8,15, ovarian15,17, and pancreatic cancers15,18, and it is often regarded as the best method to assess HRD-status due to the high precision and robustness of the tool. HRDetect uses several measurements of HRD, including loss of heterozygosity, single base substitutions, and larger structural changes15,19. In contrast, ShallowHRD only considers larger structural changes, namely Large-scale Genomic Alterations (LGAs)16, which may result in a less precise assessment of the HRD-status. HRDetect relies on deep WGS (≥ 30X) of preferably fresh frozen tumor tissue, as paraffin-induced artifacts might affect the analysis. Furthermore, germline sequencing is demanded for somatic variant calling19,20,21. Conversely, ShallowHRD relies on shallow tumor sequencing (~ 1X) and works on data from fresh frozen as well as Formalin-Fixed, Paraffin-Embedded (FFPE) tissue16.

Utilizing FFPE tissue offers a significant advantage in the clinical setting, as it is far more readily available than fresh frozen tumor tissue. Additionally, shallow sequencing would be preferable due to its lower cost (10–20 times cheaper, depending on WGS depth for HRDetect), as price can be a limiting factor. Guo et al. tested ShallowHRD with success in predicting ovarian cancer patients’ response to treatment with PARP inhibitors11, and Sztupinszki et al. conducted a comparison of HRDetect to the HRD-score of The Cancer Genome Atlas and concluded that they perform similarly in the assessment of HRD-status in ovarian cancer17. However, to our knowledge, proper head-to-head assessment of ShallowHRD compared to HRDetect has yet to be performed in breast cancer patients.

Therefore, we aim to compare ShallowHRD results to previously reported HRD-status attained from HRDetect22 in breast cancer patients to assess the usability of ShallowHRD in a clinical setting.

Materials and methods

General workflow

FFPE and fresh frozen tumor tissue were sequenced using WGS at 1-5X and 50X, respectively, and the fresh frozen data was additionally bioinformatically downscaled to 5X. The 50X Fresh Frozen data was used for HRDetect, making up the truth set for comparison. ShallowHRD was used on the fresh frozen data at 50X, the downscaled Fresh Frozen data, and the shallow FFPE sequencing data. Figure 1 provides an overview of the workflow.

Fig. 1
Fig. 1
Full size image

Study design: Fresh Frozen and FFPE tumor tissue from breast cancer (BC) patients (n = 19) were subjected to WGS at 50X and 1-5X, respectively. The 50X Fresh Frozen data was analyzed using HRDetect and sequencing data was then bioinformatically downscaled to 5X. The three datasets (Fresh Frozen (50X), Fresh Frozen (5X), and FFPE (1-5X)) were then analyzed using ShallowHRD. The HRD-status from ShallowHRD was then compared to the HRDetect output23. Created in BioRender. Thomassen, M. (2025) https://BioRender.com/ a66p512.

Patient material

For this study, 19 breast cancer patients were selected from a cohort of patients that has previously been analyzed using data from WGS (50X) of fresh frozen tissue in HRDetect22 and analyzed for BRCAness using an RNA-profiler24. These 19 patients were selected based on the availability of FFPE tissue matching the previously analyzed frozen tissue. The inclusion criteria for the non-BRCA carriers were as previously described by Andersen et al., and 2 positive BRCA carriers from the original cohort were included22. Patient information can be found in Table S1, where germline variant information also is available22. Of the BRCA-positive patients, Sample 42 carried a known pathogenic germline BRCA1 variant (NM_007300.4:c.2475delC, p.(Asp825Glufs*21)), and Sample 47 carried a pathogenic germline BRCA2 variant (NM_000059.3:c.6490_6491delCA, p.(Gln2164Valfs*11)). The 16 remaining patients carried no known pathogenic variants in BRCA1 or BRCA2. However, two patients (Sample 10 and 37) had a hypermethylated BRCA1 promoter, and one (Sample 11) was a carrier of two variants - a somatic BRCA1 (NM_007300.4:c.3668T > A, p.(Leu1223His)) variant and a germline variant of unknown significance (VUS) in RAD51D (NM_002878.3:c.202G > A, p.(Gly68Ser)). The somatic BRCA1 VUS is considered likely benign, in part due to its low variant allele frequency in the original analysis22, but also due to its location within exon 11 of BRCA1 and outside functional domains, which according to ENIGMA guidelines25 indicates a likely benign variant (BP1_Strong). These five samples were found to be HRD-positive using HRDetect, results are available in Table S222.

Whole genome sequencing

The fresh frozen tissue samples and matched blood samples were prepared using the Illumina TruSeq Nano protocol and sequenced on Illumina Novaseq 6000 with paired-end 2 × 150 bp. The mean sequencing coverage was 38.8X for the blood samples and 51.8X for tumor tissue22. The tumor tissue data was termed the Fresh Frozen (50X) group in the following. Data is presented in Table S3. The WGS 50X Fresh Frozen data were then downscaled to 5X using seqtk26 on the FASTQ files.

The FFPE tissue samples were prepared using the ThruPLEX Tag-seq protocol from Takara Bio and sequenced on Illumina Novaseq 6000 with paired-end 2 × 150 bp. 16 of 19 samples were sequenced twice at a shallow depth to check if the lower sequencing depth affected the results. Two of the samples (Sample 11, and 42) were not resequenced as their coverage was already above 5X after Run 1, while Sample 10 was not resequenced due to lack of material. The mean coverage was 3X after Run 1, and for samples undergoing merging, 5.96X after Run 2; coverage information can be found in Table S3. Data from Run 1 is used to compare to HRDetect. The WGS data was aligned to the human reference genome (GRCh37) using BWA-MEM27.

Determination of HRD-status

The HRDetect model was applied as previously described22, and the results will be regarded as the truth set for the comparisons conducted in this study. As part of the ShallowHRD R-script by Eeckhoutte et al.16,28, the aligned reads were analyzed using the recommended 50 kb sliding windows with QDNAseq29, which detects Copy Number Alterations (CNAs) and cut-offs for these. The cut-off calculation by ShallowHRD is based on the result of 100,000 simulations, and the cut-off value is termed the M-value. Another obtained value is the corrected Median Absolute Deviation (cMAD), measuring the level of fluctuations in the number of reads per sliding window. These two values indicate analysis quality, as a cMAD value below 0.14 and an M-value in the interval 0.025–0.45 indicate good quality16. Additionally, ShallowHRD also assesses the tumor contents of the samples in the MAX2 value, as a value below 0.16 is considered low tumor content16.

ShallowHRD determines HRD-status based on the CNAs, as it counts the number of LGAs, meaning segments larger than 3 Mb above the CNA cut-off value. As described by Eeckhoutte et al., values below 15 are HRD-negative, while 20 or more LGAs signify HRD-positive. If a sample had an LGAs number in the interval 15 ≤ X ≤ 19 LGAs, it is instead placed in the category Borderline for visual investigation16. The visual investigation was conducted by 3 individuals, of whom 1 was blinded to the HRDetect status of the samples, and all without knowing the assessment of the other investigators. All 3 investigators reached the same conclusions regarding the reclassification.

Comparison of HRD methods

The output of the ShallowHRD analyses was then compared to the HRDetect-status and was given a category: True Positive or True Negative, if the result correlated with the output from HRDetect, and False Positive or False Negative if it did not.

Receiver operating characteristic (ROC) curve analysis was performed separately for each of the three datasets (Fresh Frozen 50X, Fresh Frozen 5X, and FFPE 1-5X). The area under the curve (AUC) values were used to assess the correlation between ShallowHRD and HRDetect. The statistical significance of the AUCs was calculated relative to HRDetect’s performance, which is assumed to have an AUC of 1.0, and to that of a random classifier, which corresponds to an AUC of 0.5. Furthermore, a previously estimated AUC for HRDetect of 0.9636 by Diossy et al.30 was compared to the ShallowHRD AUCs of this study.

AUCs, standard errors, and p-values were calculated based on the datasets generated in the study as well as after bootstrapping (1000 iterations) of these datasets. The analysis was performed using functions from Scikit-learn31 and scipy32 with the code available on GitHub33.

In addition to ROC curve analysis for evaluating ShallowHRD’s classification performance, Spearman rank correlation was used to assess score-level concordance. Correlation between ShallowHRD scores and HRDetect scores was calculated separately for each dataset (FFPE (1-5X), Fresh Frozen (5X), and Fresh Frozen (50X)) using the cor.test function from the R Stats Package34. Further, to assess the reproducibility of ShallowHRD scores across input types and sequencing depths, pairwise Spearman correlations were also performed between each of the ShallowHRD datasets.

Results

Quality

Before comparing ShallowHRD to HRDetect, we examined the quality of the samples, the simulations, and the estimated tumor content of the samples. Most samples were estimated to have sufficient tumor content. However, 4 samples were estimated to have low tumor content in the Fresh Frozen (50X) dataset, 8 in the Fresh Frozen (5X) dataset, and 4 in the FFPE (1-5X) set. Two of these samples were estimated by ShallowHRD to have a low tumor content in all datasets (Table S4). This correlates somewhat with previously attained data from Andersen et al., where the Battenberg output indicates a tumor content of above 20% in all samples (see Table S7), however, the number of substitutions found within the samples aligns well with what could be expected in tumor samples and the data was of a quality usable in HRDetect analysis22.

When examining the ShallowHRD quality parameters, all samples, regardless of sequencing depth and tissue type, are of good quality, as the cMAD value is below 0.14 and the M-value is within the designated interval for all samples, see Table S4. Thus, it is estimated that the analysis was conducted on a dataset of reasonable quality.

ShallowHRD output

The LGA scores show slight variation across the data groups, as illustrated in Fig. 2A, providing a detailed breakdown of the scores for each sample. This visual representation allows for easy identification of the HRD-status, with ShallowHRD outputs (S11, S14, S21, S37) indicating a borderline HRD-status. Additionally, a boxplot displaying individual score distributions is available in Fig. S1, where the variation across the data groups was non-significant. As few completed ShallowHRD simulations were available in a single ShallowHRD analysis of Sample 10, the LGA score of Sample 10 is the average of ten ShallowHRD simulation runs (Fig. 2A). All individual scores were 20 or above, and the results of these separate analyses are available in Table S5.

The HRD-scores from the HRDetect analysis conducted by Andersen et al.22 are presented in Fig. 2B for comparison to the ShallowHRD score.

After resequencing and merging data for 16 samples, slight non-significant differences in LGA scores were identified. However, no category changes were detected (Fig. 2C and Table S4).

Fig. 2
Fig. 2
Full size image

HRD-Analysis Outputs. (A) ShallowHRD LGA scores are depicted as bars for each sample. Dashed lines indicate the borderline area (15 ≤ X ≤ 19). Values above the line at 19.5 are considered HRD-positive, and samples with scores below the line at 14.5 are considered HRD-negative from the ShallowHRD output. Asterisks represent ShallowHRD scores of 0. Sample names in bold and blue are the samples tested HRD-positive in HRDetect. Sample 11 is the carrier of a germline VUS in RAD51D (c.202G > A); samples 10 and 37 have a hypermethylated BRCA1 promoter. Samples 42 and 47 are carriers of pathogenic germline variants in BRCA1 (c.2475delC) and BRCA2 (c.6490_6491delCA), respectively. (B) Visualization of HRDetect values for each sample. Dots in blue indicate HRD-positive values above 0.7 (dashed line). (C) Comparison of FFPE shallow sequencing of Run 1 and Merge (Run 1 + Run 2). The dashed lines indicate the borderline area (15 ≤ X ≤ 19). For the horizontal line, values above the line at 19.5 are considered HRD-positive, and samples with Merged scores below the line at 14.5 are considered HRD-negative. Samples with a Run 1 score to the right of the vertical 19.5 score are considered HRD-positive, while those to the left of the vertical 14.5 score are considered HRD-negative. In two sample sets (sample 19/31 and sample 7/18), the pairs have identical values to each other samples in Run 1 and Merged, leading to overlapping points at (1,1) and (2,2), respectively. No significant differences between the runs were identified (pairwise Wilcoxon signed-rank test, p < 0.05).

Comparison to HRDetect

When comparing the ShallowHRD scores to the HRDetect, each result was categorized as True Negative, True Positive, False Negative, False Positive, or Borderline if the ShallowHRD score was between 15 and 19. The categorization result is represented in Figure S2. Here, it is also worth noting that the categorization did not change between Run 1 and the Merged result of the FFPE tissue (Table S4).

It was apparent that a visual inspection of the borderline results was required to complete the categorization and comparison. The visual inspection recategorized the borderline samples into one of the other categories, visualized in Fig. 3A. The individual sample categories are available in Table S4, where it also becomes evident that the categories of the borderline samples after visual assessment are identical to the categorization of the sample in the other tissue datasets.

Slight differences between the tissues are detected in the True Negative and False Positive categories, where ShallowHRD incorrectly categorized two Fresh Frozen (50X) samples and one Fresh Frozen (5X) sample as HRD-positive. However, these differences are not significant (Table S4), and the ShallowHRD scores were highly correlated across the datasets. FFPE (1-5X) scores correlated strongly with those from Fresh Frozen (5X) (Spearman ρ = 0.79, p = 5.1 × 10−5) and Fresh Frozen (50X) (ρ = 0.79, p = 5.8 × 10−5). A strong correlation was also observed between Fresh Frozen (5X) and (50X) (ρ = 0.88, p = 5.1 × 10−7). These results suggest that ShallowHRD provides reproducible estimates of HRD-associated genomic instability across tissue types and sequencing depths. For all three tissue groups, Sample 11 is incorrectly categorized as HRD-negative. All of this brings the sensitivity of ShallowHRD compared to HRDetect to 80% for all tested tissue groups, while the specificity in Fresh Frozen (50X) is 85.7%, in Fresh Frozen (5X) 92.9%, and in the FFPE (1-5X) analysis 100%.

Fig. 3
Fig. 3
Full size image

Comparison to HRDetect. (A) Result of the comparison of ShallowHRD generated HRD-status to that of HRDetect after visual inspection of Borderline samples. Kruskal-Wallis test and pairwise Wilcoxon rank sum test results with Bonferroni correction were performed, and no significant differences between the tissue groups were found. Results of each sample’s significance tests and category are available in Table S4. (B) Receiver operating characteristic (ROC) curves for each data set (Fresh Frozen 50X, Fresh Frozen 5X, and FFPE (1-5X)) illustrating the overall performance of ShallowHRD to HRDetect. Pink dots indicate the location of the cutoff for HR deficiency (LGAs > 19), the cyan dots indicate the cutoff for HR proficiency (LGAs < 15). Values for the curves can be found in Table S6.

To assess the overall performance of ShallowHRD compared to HRDetect in determination of HRD-status, a ROC curve analysis was performed (Fig. 3B). AUC values and p-values for each data set, with and without bootstrapping, are presented in Table 1, where it becomes evident that ShallowHRD is significantly different from a random classifier in all data sets. The non-bootstrapped data sets are not statistically different from assumed AUC of HRDetect (1.0) or the AUC achieved by Diossy et al.30, while the comparison with bootstrapped datasets are significantly different in all comparisons, except for Fresh Frozen 5X against the Diossy et al.30 AUC.

Table 1 P-values when comparing the ShallowHRD dataset AUCs to hrdetect (AUC 1.0) and any random classifier (AUC 0.5).

Spearman rank correlation analysis also indicated a strong positive correlation between HRDetect and ShallowHRD scores in FFPE (1-5X) samples (ρ = 0.68, p = 0.0013). Correlations were moderate but still statistically significant in Fresh Frozen (5X) (ρ = 0.58, p = 0.0086) and Fresh Frozen (50X) samples (ρ = 0.50, p = 0.029). These findings indicate general concordance between the two methods at the score level, particularly in the clinically relevant FFPE setting.

Discussion

The aim of our study was to compare ShallowHRD to HRDetect in breast cancer tissue. After visual inspection of genomic profiles, we found that ShallowHRD performed with high specificity and sensitivity in all data groups. However, slight differences between the data groups were detected, with the FFPE analysis outperforming the Fresh Frozen datasets when focusing on false positives. This may be attributable to improvements implemented in preparation kits in the time from sequencing of Fresh Frozen to FFPE tissue. The specificity values observed in our study align well with those previously reported, while the slightly lower sensitivity is likely influenced by the limited sample size16.

The ROC curve, AUC values, and statistical significance analyses illustrate that ShallowHRD performs significantly better than a random classifier. No statistical differences were found when comparing the ShallowHRD AUC to that of HRDetect from Diossy et al.30 or the perfect classifier AUC. When bootstrapping the ShallowHRD output, only the Fresh Frozen 5X dataset produced AUC values statistically similar to those from Diossy et al.; all other bootstrapped values differed significantly.

In summary, ShallowHRD performs significantly better than a random classifier in its ability to determine HRD-status, and it displays AUC performance similar to HRDetect, in both FFPE and frozen tissue datasets. In the study by Davies et al., the sensitivity of HRDetect was reported to decrease from nearly 1.0 at 30X coverage to 0.86 at 10X coverage in fresh frozen tissue. Additionally, HRDetect was applied on an FFPE-derived sample with a known pathogenic BRCA1 variant and correctly predicted HRD, despite formalin-induced artifacts that prevented detection of a key mutational signature15. However, broader FFPE validation was not conducted. As such the 0.8 sensitivity achieved in this study and a comparable AUC, ShallowHRD may offer a more practical alternative when FFPE tissue is the only option available in the assessment of HRD-status in breast cancer, as previously demonstrated for ovarian cancer11.

Further, the observed Spearman correlation between HRDetect and ShallowHRD scores, especially the strong correlation in FFPE samples, supports the hypothesis that ShallowHRD captures key features of HRD-associated genomic instability. While correlation does not imply equivalence or classification agreement, these results suggest that ShallowHRD and HRDetect scores track together in magnitude, which may be useful for further validation and calibration efforts.

In all three data groups, the HRD-positive Sample 11 was the only case misclassified as HRD-negative by ShallowHRD. This sample carried a germline VUS in RAD51D, alongside a somatic BRCA1 variant considered likely benign22. Structural variants have been well-characterized as a result of HRD and the subsequent genomic instability mediated by BRCA1 and BRCA2 dysfunctionality35,36,37,38, and as a part of the HR pathway, RAD51D dysfunction would also lead to HRD and higher levels of genomic instability39,40.

However, the identified RAD51D variant is currently classified as a VUS, even though minigene RNA data suggest that the variant affects splicing41. However, the findings from the ShallowHRD analysis could indicate a tendency towards a likely benign classification, thereby going against the findings of the study using HRDetect22. A potential reason for this disagreement between ShallowHRD and HRDetect could be due to a less structurally disruptive subtype of HRD as a result of the RAD51D variant.

Therefore, further studies into the specific RAD51D VUS as well as known RAD51D pathogenic and benign variants are warranted to clarify what causes the observed discrepancy between ShallowHRD and HRDetect. Understanding whether deleterious RAD51D variants lead to the same level of structural variants, such as the LGAs, as pathogenic BRCA1 and BRCA2 variants would help evaluate ShallowHRD’s performance. Such analyses could potentially indicate if gene-specific cut-offs are required. If RAD51D variants produce a different pattern of structural variants, this could explain why ShallowHRD fails to assess this sample accurately when compared to HRDetect’s more exhaustive analysis of HRD signatures.

This discrepancy also raises a broader question about how the performance characteristics of HRD classifiers, particularly their sensitivity and specificity, affect their clinical utility. HRD-status is useful for several clinical aspects, most notably as a predictive marker for response to PARP inhibitors, and potentially as supportive evidence in the future classification of VUS. Depending on the intended use, a higher sensitivity or specificity of a classifier would be favored. As a predictor of treatment response, a high level of sensitivity is needed to minimize the number of false negatives to avoid missing patients who would have benefited from the treatment. Conversely, for VUS classification, high specificity is presumed to be more important. If a VUS is present in an HR gene, a truly pathogenic VUS would be expected to lead to HRD and increased genomic instability. However, studies have shown that HRD is present in tumors without pathogenic variants42,43. Therefore, the absence of HRD in a tumor carrying a VUS may support a benign classification, but only if the test has high specificity and reliably identifies true negatives. Ultimately, both sensitivity and specificity matter in each context, and the clinical utility of HRD testing depends on balancing these priorities. For example, overly low specificity in treatment prediction may lead to overtreatment, while overly low sensitivity in VUS classification could falsely support benign interpretation of truly pathogenic variants. These considerations underscore the importance of evaluating not only biological relevance but also methodological robustness in HRD assessment. As of publishing this paper, it is unclear which HRD algorithm is optimal for VUS classification and what strength HRD-status can provide for classification. However, the ENIGMA44 consortium is currently conducting studies to define classification strength of HRD-status for both benign and pathogenic classification purposes.

In our study, ShallowHRD achieved a sensitivity of 80% and a specificity ranging from 85.7 to 100% across the three datasets. While the sensitivity may limit its immediate clinical utility as a standalone predictor of treatment response, the high specificity, particularly in the FFPE samples, supports its potential value in research settings, including as supportive evidence in VUS classification. However, these findings need further verification in larger cohorts.

Further limitations of our study should also be accounted for. First, the sample size is small (n = 19), limiting our findings’ statistical power and generalizability. The retrospective nature of the study and reliance on available matched FFPE and frozen tissue samples introduce potential selection bias, as tissue quality and availability may not reflect routine clinical populations. While bootstrapping was applied to explore robustness, no formal internal or external validation of ShallowHRD performance was conducted, and the risk of overfitting and type I or II statistical errors cannot be excluded.

Future work should aim to validate these promising preliminary findings in larger, prospectively collected cohorts with more BRCA1/2-deficient tumors, ideally across multiple institutions. Using additional independent methods to assess HRD, such as functional assays or clinical treatment response, could also be valuable in evaluating the clinical relevance. Ultimately, such studies are needed to determine if ShallowHRD can serve as a reliable, cost-effective diagnostic tool for HRD detection in routine clinical practice for breast cancer, as this preliminary study suggests.

Conclusions

The performance of ShallowHRD is, in this preliminary study, comparable to that of HRDetect, making it a potentially quicker and more economical alternative for assessing HRD-status due to lower sequencing depth of malignant tumor tissue and its ability to perform without paired sequencing of normal tissue. Furthermore, the power of ShallowHRD to work on data from FFPE tissue makes it usable for HRD-status assessment for most patients with breast cancer. ShallowHRD differed from HRDetect in the assessment of one sample. Additional research into the cause of this difference is required, and to make finite conclusions on ShallowHRD sensitivity, examination and validation in larger, external cohorts are required.

In conclusion, this study indicates that using ShallowHRD to assess HRD-status of malignant breast tumors has substantial potential in a clinical setting.