Introduction

Cardiomyopathies represent a heterogeneous family of diseases often leading to progressive heart failure with significant morbidity and mortality1, and are classified as either primary or secondary. The former are predominantly confined to the heart muscle1,2, and have genetic, acquired, or mixed causes, while the latter are characterized by myocardial damage resulting from systemic or multi-organ disease2,3. Members of this family include dilated cardiomyopathy (DCM), hypertrophic cardiomyopathy, restrictive cardiomyopathy, and arrhythmogenic right ventricular cardiomyopathy3.

DCM is the most common form of human cardiomyopathy, with a reported incidence of 5 to 7 in 100,000 adults and 0.57 in 100,000 children4,5. It is the third leading cause of heart failure in the United States behind coronary artery disease and hypertension1,3. DCM results in an enlarged and weakened heart, with symptoms including shortness of breath, fatigue, orthopnea, and edema1,2,3. Left ventricle dilation and contractile dysfunction characterize the disease. Consequently, the heart cannot pump sufficient blood throughout the rest of the body, further increasing susceptibility to complications like arrhythmias, blood clots, and sudden death.

Dogs share many hereditary diseases with humans6; as such, they provide a strong genetic model for human disease. The reduced heterogeneity amongst purebred dogs in particular provides a rich setting for causative mutation identification7. Moreover, DCM is reported to progress similarly between dogs and humans8, developing across multiple phases. Beginning with an asymptomatic period before clinical signs start to manifest, there are no functional changes in cardiac tissue yet. But it is possible the underlying causes are initiating the disease8. Cardiovascular changes, both electrical and morphological, characterize the next stage; these changes are detectable via Holter monitors and echocardiography. Because subjects are still asymptomatic, this stage is often termed “occult DCM”8,9. Finally, subjects present clinical signs of heart failure during the “overt stage” of DCM.

Doberman Pinschers are one of the most susceptible breeds to DCM—the cumulative prevalence within a European cohort was estimated at 58.3%10. The disease’s high prevalence, coupled with reduced heterogeneity within the breed, makes the Doberman Pinscher an ideal model for GWAS. Indeed, like the case for human DCM, contemporary literature suggests an important relationship between genetic predisposition and DCM pathology in Doberman Pinschers11.

The first published GWAS for Doberman Pinscher DCM, conducted by Mausberg et al., identified an association on CFA5 in a cohort of German dogs, but did not identify specific candidate genes12. A subsequent GWAS by Meurs et al. reported an association with a gene encoding for PDK4 on CFA1413. Owczarek-Lipska et al. attempted to replicate the PDK4 association in a European cohort, but were unsuccessful14. In a study of a family of Doberman Pinschers without the PDK4 variant, Meurs et al. reported a variant in the TTN gene associated with the disease15. However, in a follow-up study of the PDK4 and TTN associations, the authors concluded that while the TTN variant was most common in a cohort of 48 affected Doberman Pinschers, 6 dogs had neither the PDK4 nor the TTN variants16. In contrast, Niskanen et al. later refined and replicated Mausberg et al.’s findings using a discovery cohort of German dogs and identified two candidate genes: RNF207 and PRKAA2, known for their involvement in cardiac action potentials17. The study reported risk allele frequencies of 50.5% in cases (\(N=178\)) and 25.9% in controls \((N=143)\) for the chr5:53,109,178G>A variant. We could not find SNP array-based frequencies for the other reported variant, chr5:60,111,983G>A. Additionally, Niskanen et al.’s RNA analyses revealed alternatively spliced transcripts in RNF207 and PRKAA2, suggesting that disrupted RNA processing might contribute to the molecular mechanisms underlying DCM.

Though associations between mutations and Doberman Pinscher DCM have been demonstrated already, there remains a clear need for further study. Since there are cases of Doberman Pinscher DCM with neither the PDK4 nor titin mutations (and vice versa), these genes are not solely responsible for the high prevalence within the breed. However, the replication of findings on chromosome 5 in a cohort of German dogs strengthens the argument for this locus, though additional validation is needed. Furthermore, the observed allele frequencies in affected and unaffected dogs suggest that this variant is not the sole factor driving DCM in Doberman Pinschers. Thus, deeper exploration is imperative to improve diagnosis, treatment, and breeding strategies. Studying DCM in Doberman Pinschers also offers a valuable opportunity to uncover novel insights into the disease’s manifestation and progression in humans.

Methods

Doberman diversity project database

The Doberman Diversity Project (DDP) provided collaborative access to their database of health profiles for more than 3000 Doberman Pinschers. The owners of these dogs have given permission for their dogs’ phenotype information from DDP as well as their genotype information from Embark to be shared with researchers. Other studies have also focused on the DDP database; for instance, Wade et al. recently investigated the population structure in the DDP database, particularly in relation to the geography and purpose for which the dogs were bred18. For all of our analyses, we selected 46 dogs already in the DDP database with confirmed DCM diagnoses from veterinary cardiologists for the case group. Initially, in a preliminary experiment, we selected two control groups based on the following criteria:

  1. Control Group I:

    Lived at least 9 years and had healthy Holter/ECG reports at age 7 or older.

  2. Control Group II:

    Lived at least 9 years, but either had no available health reports or the Holter/ECG was performed before age 7.

Each of these control groups yielded 27 dogs, for a case-control study consisting of 100 Doberman Pinschers. De-identified data sets for the case, control 1, and control 2 groups as well as for the entire 3272 dog population were provided by the DDP in collaboration with Embark who, at the time of the study, housed the full genotype data. However, this comparison of cases to controls lacked sufficient statistical power to draw robust conclusions (see Supplementary File 1).

To address this limitation, we aimed to increase the database’s depth and statistical power for future analyses, which may extend beyond the current focus on DCM. We exchanged direct correspondences with listed breeders and owners to establish health updates for participating dogs. In our communications, we discovered at least two of our controls later developed DCM. Due to the high prevalence of the disease, it is likely that our strict control groups were further compromised. Consequently, we opted to use a population control approach, treating unphenotyped samples as controls, and utilized the entire existing database for this study. The analysis that follows compares the original 46 cases with these population controls.

Ethics statement

The data analyzed in this paper is obtained with permission of the database owner, Doberman Diversity Project. Owners that participate in the Doberman Diversity Project do so after providing consent for de-identified data to be shared for research purposes. No animals were sampled directly for this study. All methods were carried out in accordance with relevant guidelines and regulations.

Genotyping

Embark19 performed the genotyping for 3272 participating dogs. Originally based on the Illumina CanineHD BeadChip, Embark’s custom microarray contains probes for 216,184 single nucleotide polymorphisms (SNPs). For maximum coverage, SNPs were placed throughout the genome according to the CanFam320 reference sequence.

Statistical analysis

Data cleaning and quality control

Of the 3272 dogs, 46 were cases and 3226 were missing phenotype information. We treated those dogs with no phenotype data as controls for this study. We performed stringent preliminary quality control to assess and improve data integrity prior to association analysis. Filtering measures were conducted for both samples and SNPs to ensure robust association testing. We used KING21 to identify duplicate samples and PLINK v1.9022 for all remaining filtration steps.

Our analysis dealt exclusively with autosomal SNPs; that is, we removed data corresponding to sex chromosomes and mitochondrial DNA. Then we removed the case dogs necessarily duplicated in the unphenotyped group (a list of these duplicates is provided in Supplementary Table S3). After, we discarded another 14 duplicates identified within the unphenotyped subjects. We did not filter any further on relatedness; instead, we computed and utilized the genetic relatedness matrix (GRM) to fit the association model.

Next, we removed 13,678 SNPs and 8 dogs because of missing values using both relaxed (+–geno 0.2+; +–mind 0.2+) and strict (+–geno 0.02+; +–mind 0.02+) thresholds23. An additional 87,611 SNPs failed to meet the minor allele frequency (MAF) cutoff of 5% and were therefore excluded from analysis (+–maf 0.05+). We removed another 5 dogs due to excessive heterozygosity and 74 SNPs which strongly deviated from Hardy-Weinberg equilibrium (\(P < {1\times {10}^{-6}}\)).

In total, 3199 dogs and 103,849 SNPs remained for association analysis.

Association testing

To account for sample structure, we used a generalized linear mixed model for the association testing. Mixed models are statistical models consisting of both fixed effects and random effects. They are most useful in settings involving clusters of related statistical units24. As such, mixed models are effective in preventing type I errors (i.e., false positive associations) resulting from sample structure. In these models, each SNP is tested to determine whether the variance of the genetic effect at that locus significantly deviates from zero25. We used SAIGE26 for the association testing because of the included saddlepoint approximation (SPA), which adjusts test statistics to correct for unbalanced case-control ratios.

We analyzed the results of the association test graphically using both quantile–quantile (Q–Q) and Manhattan plots. Q–Q plots allow for the comparison of two probability distributions. Specifically, they quantify the extent to which the distribution of observed P-values deviates from the expected distribution under the null hypothesis of no association. Early deviation from the expected distribution is indicative of population stratification. We plotted each SNP against its \(-\log _{10}(P)\)-value to obtain a Manhattan plot. In addition, we calculated the genomic inflation factor \(\lambda _{\textrm{GC}}\) to measure systematic bias and stratification in our study.

Candidate gene identification

Associated regions identified during testing were further explored in the UCSC genome browser according to the canFam3 reference genome27. To identify nearby genes, we extended the analysis by including 100kb flanking regions on either side of the identified loci.

Results

Table 1 The 10 most significant SNPs from our analysis sorted by \(P_\textrm{spa}\). Note the cluster of SNPs on CFA16 represent the only signals with \(P_\textrm{spa}\) on the order of \(10^{-6}\). SNP positions aligned according to the canFam3 reference panel.

Results from our analysis are given in Table 1 and Fig. 1. Specifically, details for the 10 most significant markers are provided in Table 1, along with locations, raw and adjusted P-values, reference and alternate alleles, and allele frequencies in cases and controls. We focused subsequent analysis on chromosome 16 because it provided the strongest signal.

Figure 1 details a more complete view of the results, including Q–Q and Manhattan plots in the top panels. Note the Q–Q plot in the top-left panel of Fig. 1 confirms our analysis accurately controlled for population stratification (\(\lambda _{\textrm{GC}} \approx 1.058\)). The most compelling signal in the top-right panel of Fig. 1 implicates a cluster of loci near CFA16:31,161,738 bp–31,408,726 bp. The leading marker in this region—SNP BICF2G630113368—has saddlepoint adjusted \(P_\textrm{spa} = {3.687\times {10}^{-6}}\) (\(P_\textrm{raw} = {4.02\times {10}^{-7}}\)). We identified this variant in approximately 60.8% of cases and in nearly 30.3% of population controls.

Fig. 1
figure 1

Results from case group vs. population controls. The Q–Q plot in the top left indicates adequate control for population stratification (\(\lambda _{\textrm{GC}} \approx 1.05\)). Manhattan plot reveals a signal on chromosome 16 (\(P_\textrm{spa} = {3.687\times {10}^{-6}}\), \(P_\textrm{raw} = {4.02\times {10}^{-7}}\)). Bottom panel highlights the window around this signal, including 100kb bands on either end. Signaled SNPs are shown in red. Genes within this window according to the canFam3 reference genome include: DUSP26 (31,115,381–31,118,532 bp), RNF122 (31,135,967–31,144,921 bp), TTI2 (31,161,076–31,168,142 bp), MAK16 (31,166,051–31,179,981 bp), and FUT10 (31,194,629–31,266,964 bp).

The bottom panel of Fig. 1 offers a closer view of this region on chromosome 16. Specifically, we overlaid nearby genes relative to the reported SNPs in the UCSC genome browser according to the canFam3 reference genome. We included flanking regions of 100 kb on either end of the signaled region and noted a total of five nearby genes: DUSP26 (31,115,381–31,118,532 bp), RNF122 (31,135,967–31,144,921 bp), TTI2 (31,161,076–31,168,142 bp), MAK16 (31,166,051–31,179,981 bp), and FUT10 (31,194,629–31,266,964 bp).

Discussion

We identified a region of suggestive significance near CFA16:31,161,738–31,408,726 bp associated with DCM in a cohort of 3272 Doberman Pinschers. To the best of our knowledge, no prior associations between Doberman Pinscher DCM and CFA16 have been demonstrated. Importantly, we did not replicate the associations previously identified on chromosomes 1413, 3615, or 512,17. Moreover, we presented a unique analytical approach, utilizing a broad population control group for exploratory analysis. This method offers a pragmatic means of uncovering potential associations that may have otherwise remained undetected.

Using this approach, we identified 5 candidate genes for further analysis (Fig. 1). A review of the existing literature did not expose a connection between cardiovascular function or disorder and the RNF122 and MAK16 genes. Amongst the remaining candidates, DUSP26 may offer particular insight relative to the others. In a 2021 study, Zhao et al. concluded DUSP26 protected against pressure overload induced cardiac hypertrophy in mice. Specifically, the authors noted “cardiac-specific overexpression of DUSP26 mice showed attenuated cardiac hypertrophy and fibrosis, while deficiency of DUSP26 in mouse hearts resulted in increased cardiac hypertrophy and deteriorated cardiac function”28.

Continuing our review of existing literature surrounding these genes, the TTT complex—formed when TTI2 interacts with TELO2 and TTI1—also plays a role in related conditions. Notably, congenital heart disease has been observed in patients with mutations in components of this complex. These interactions are essential for the maturation of phosphatidylinositol 3-kinase-related protein kinases29.

Similarly, genes involved in cellular responses under stress conditions, like FUT10, may also be relevant. FUT10 is a regulator of cell proliferation and is upregulated in ischemic heart disease. Mulari et al. demonstrated that, in patients undergoing bypass surgery, an increased expression of FUT10 is correlated with coronary artery obstruction complexity and with NT-proBNP levels30.

We initially restricted our search to the associated region itself and the 100 kb bands on either end of it. But extending our window of investigation out to 500 kb reveals one additional candidate, Neuroregulin-1 (NRG1). NRG1 is secreted by endothelial vascular cells in response to injury or stress and activates ErbB family receptors. In adults, the NRG1/ErbB pathway is viewed as compensatory and attenuates cardiac remodeling, fibrosis, and inflammation31,32. Additionally, loss of function of NRG1, ErbB2 or ErbB4 in mouse models results in a dilated cardiomyopathy phenotype33.

While this region represents a novel association, we emphasize the need for replication studies and further analysis to corroborate our findings. The candidate genes introduced require in-depth investigation to establish their role in Doberman Pinscher DCM pathogenesis. Gene expression studies, for example, may help ascertain their true relevance.

It is crucial to acknowledge the inherent limitations of this study. Primarily, our broad definition of a “control” group, necessitated by our analytical approach of utilizing unphenotyped samples, introduces a degree of uncertainty that must be considered. In addition, the relatively small pool of 46 cases restricts alternative approaches that might be considered with a larger case cohort. For instance, with an expanded case group, fitting a Bayesian sparse linear mixed model to predict phenotypes may offer more robust insights34. Moreover, variability in diagnostic criteria used by different veterinarians could affect the consistency of our case data. Finally, although we controlled for population stratification to the best of our ability, the de-identification process prevented us from accessing information on the ancestral origins of our case group, which may limit the precision of our findings. These limitations underscore the importance of validation studies through independent cohorts.

In conclusion, we have identified a region containing five genes on CFA16 that may be associated with DCM in the Doberman Pinscher. Future analysis, including replication and functional studies in independent cohorts, is essential to assess the strength and overall relevance of these candidates. Importantly, this research represents another step in bridging the gap between canine and human cardiac genetics, offering potential insights that could inform and improve our understanding of DCM in both species.