Introduction

Polycystic kidney disease (PKD) is a kidney disorder characterised by the presence of cysts on and enlargement of the kidneys, leading to progressive kidney failure [1]. Extrarenal manifestations can occur with PKD, including hepatic and pancreatic cysts, and intracranial aneurysms [2]. PKD can occur with autosomal dominant (ADPKD) or autosomal recessive (ARPKD) inheritance. Pathogenic variants in PKD1 and PKD2 account for 75–80% and 15–20% of ADPKD cases, respectively [2]. Other genes including IFT140, GANAB, ALG9, and DNAJB11 are known to cause atypical ADPKD phenotypes [3,4,5,6,7,8]. ARPKD is a rare form of PKD, with nearly all suspected cases harbouring PKHD1 mutations [9], or, rarely, mutations in DZIP1L [10]. However, 6–13% of PKD patients lack a genetic diagnosis even after appropriate genetic testing [11,12,13].

The severity of ADPKD is correlated with the type of pathogenic variant. Patients with PKD1-associated disease have a more severe disease outcome compared to those with pathogenic variants in PKD2 [11]. Protein-truncating variants in PKD1 (PKD1-T) are associated with earlier kidney failure relative to PKD1 non-truncating (PKD1-NT) and PKD2 variants (about 18 years or 28 years earlier, respectively) [11,13,14]. Individuals with a pathogenic missense variant in either PKD1 or PKD2 experience less severe disease progression. Missense variants in PKD1 and PKD2 can exhibit variable penetrance and therefore are often classified as variants of uncertain significance (VUS) [15].

Copy number variants (CNVs) are deleted or duplicated segments of DNA, typically defined as larger than 50 bp [16]. CNVs impacting PKD-associated genes are known to cause about 1–5% of genetically-defined PKD cases [12,13,17,18]. In addition to PKD, there are multiple examples of other genomic disorders resulting from a CNV encompassing a renal disease gene. For example, a recurrent deletion on chromosome 16 impacting PKD1 and TSC2 results in the development of both ADPKD and tuberous sclerosis complex [19]. A recurrent 17q12 deletion encompassing HNF1B results in renal abnormalities and cysts, but also neurodevelopmental phenotypes including autism and neuropsychiatric comorbidities [20]. Another recurrent deletion encompassing NPHP1 causes nephronophthisis [21] and can result in Joubert syndrome [22].

As most cases of PKD are caused by short nucleotide variants (SNVs), CNVs are generally only assessed after a negative screen of SNVs. The gold standard for testing for CNVs in PKD is multiplex ligation-dependent probe amplification (MLPA), a resource-intensive, gene-specific, and expensive test [23]. Array-CGH can also be used, but high-resolution arrays are needed for detection of smaller (intragenic) CNVs. Array-CGH and MLPA are both relatively costly compared to NGS and must be analysed in a different workflow to NGS variant detection [12, 24]. This separation of testing workflows may result in increased time to a molecular diagnosis for people with CNV-related PKD, or lack of diagnosis where MLPA or array-CGH is unavailable.

CNV calling from NGS data is a well-established practise, and tools have been developed for discovering CNVs from targeted sequencing such as targeted gene panel (TGP) or exome sequencing (WES). Most tools that support targeted sequencing utilise read-depth based algorithms, which attempt to identify CNVs through local changes in sequencing depth [25]. Targeted sequencing thus limits the detection of copy-neutral structural variants, which typically require algorithms such as paired-end mapping [26]. PKD1 falls within a segmentally duplicated region [27], which have been reported as challenging for current CNV calling tools [26]. Despite this, the efficacy of calling PKD and renal-disease related CNVs from targeted sequencing data has been demonstrated [15, 28, 29]. However, to date, a large study with a focus on NGS-called CNVs in PKD is lacking.

Here, we set out to assess the effectiveness of an NGS driven pipeline for the identification of CNVs for increasing the diagnostic utility of WES and TGP. Furthermore, we also assess the accuracy of calling CNVs from NGS and investigate the impact of non-diagnostic CNVs in cystic kidney genes on measured clinical outcomes of PKD.

Patients and methods

Patient inclusion criteria

Patients were recruited from Beaumont Hospital, Dublin, Ireland as a part of the Irish Kidney Gene Project (IKGP) [30]. To qualify for inclusion, patients had to be clinically diagnosed with PKD using the unified criteria for ultrasonography diagnosis of ADPKD or having evidence of >5 cysts [31] and have WES or TGP data available. Participants must have undergone CT, MRI or ultrasound imaging of their abdomen to assess for kidney and liver cysts.

During recruitment, patients were given an individual ID and a family ID to allow for segregation analysis of identified variants. Individuals with a known familial connection (i.e. through a pedigree) were assigned the same family ID.

Clinical records were used to gather information on age at kidney failure, height-adjusted total kidney volume (htTKV), and presence or absence of liver cysts. Kidney failure was defined as the need for dialysis or pre-emptive kidney transplantation. HtTKV was estimated by calculating the total kidney volume from imaging data correcting for height of the patient [32]. The occurrence of liver cysts was evaluated based on imaging report by expert clinical radiologist.

Short-read sequencing data

Sequencing of the patients was performed on whole blood as a part of the IKGP. Of the 378 patient cohort, 168 had WES (Twist Human Core Exome with RefSeq kit, sequenced on NovaSeq 6000, 2 x 100 bp reads, “low coverage” batch with average ~101x coverage (s.d. ±30.9x) and “high coverage” batch with average ~117x coverage (s.d. ±27.5)), 100 had TGP (custom Roche SeqCap EZ Choice on MiSeq or NextSeq, 2 x 150 bp reads with average ~222x coverage [17]), and 110 had both WES and TGP. Sequencing data was aligned on the GRCh38 build of the human genome. All patients had previously undergone SNV (SNP and indel) analysis using the GATK4 pipeline as described previously [17].

CNV calling

Both the GATK GermlineCNVCaller and ClinCNV were trialled for CNV calling. While both tools performed similarly, we decided to report the results for ClinCNV due to ease of pipeline execution and flexible parameters. CNVs were called from WES and TGP using ClinCNV version 1.18.3 (ref. 33). To prevent spurious CNV calls, the TGP cohort was analysed as one batch, and the WES cohort was split into two batches, as one WES batch was sequenced at a higher coverage compared to the other WES batches. The annotation of the BED file (containing the sequencing target regions) and calculation of coverage for each sample was done using ngs-bits (https://github.com/imgag/ngs-bits) with default parameters according to the ClinCNV GitHub. The mergeFilesFromFolder.R script was used to merge the coverage files, and the clinCNV.R script was run with –scoreG 50 for balanced sensitivity and –minimumNumOfElemsInCluster 15 to account for a smaller cohort size. The resulting TSV file of CNVs was converted into VCF format using an in-house bash script. All scripts used are available on GitHub (https://github.com/FutureNeuroIE/clincnv-pipeline).

Annotation and Filtering

The VCFs were annotated using the Ensembl Variant Effect Predictor (VEP) version 107 (ref. 34). The StructuralVariantOverlap plugin with the 1000 Genomes Project data was used to annotate a CNV overlap, if the overlap was >80%. The –mane_select flag was used to identify transcripts affected by the CNV.

The 1000 Genomes Project annotation was used to gather the number of rare (allele frequency (AF) < 1%) CNVs per sample, which were then plotted onto a QQ plot to determine if the distribution was normal. Normality was confirmed in all CNV calling batches, which allowed for filtering out outliers via z-scoring. Z-scores were calculated for each sample and those with a score >3 within their own CNV calling run were considered low quality and excluded from further analysis.

For the remaining samples, CNVs were filtered for genes in the PanelApp cystic kidney disease v4.6 list and for overlap with the MANE select transcript. We identified high quality CNV calls by filtering for the following ClinCNV quality information: qvalue < 0.05, loglikelihood:no_of_regions ratio >5, and potential_AF < 0.1. CNV calls were excluded if they occurred in highly polymorphic regions in the GRCh38 build of the human genome, specifically the major histocompatibility complex (MHC) region on chromosome 6 (chr6:28,510,120-33,480,577) and the leucocyte receptor complex (LRC) region on chromosome 19 (chr19:54,025,634-54,910,145).

Validation of CNVs and estimation of ClinCNV sensitivity and specificity

CNVs called by ClinCNV in PKD1 or PKD2 were validated using SALSA MLPA Probemixes P351 PKD1 and P352 PKD1-PKD2. In total, 50 samples had MLPA testing (Leipzig, Germany); sixteen individuals with a suspected PKD1/PKD2 CNV were chosen for MLPA (with at least one person per family sent for testing), and 34 individuals were chosen at random, prioritising unsolved samples and an equal number of samples with WES or TGP available to assess the accuracy of each NGS type in detecting PKD1 or PKD2 CNVs. Array-CGH was conducted on 5 samples using an Agilent 180k array. Similarly, the samples chosen for array-CGH were prioritised for unsolved samples as well as those containing large >50 kb CNVs for detection by the array.

For array-CGH, sensitivity calculations were restricted to high quality CNV calls, defined as those spanning >10 probes and >50 kb covering at least one coding exon from any gene from the sequenced data. Sensitivity was calculated as (True Positive)/(True Positive + False Negative) for array and MLPA results, and specificity was calculated as (True Negative)/(True Negative + False Positive) for MLPA results.

Classification of CNVs

CNVs impacting cystic kidney genes that passed quality criteria (see “Methods”: Annotation and Filtering) were assigned ACMG classifications based on the CNV ACMG guidelines [35]. We considered CNVs classified as “likely pathogenic” or “pathogenic” by ACMG guidelines as “diagnostic”. CNVs classified as non-pathogenic or CNVs where only one allele of a recessive gene was impacted were deemed “non-diagnostic”.

Regression analyses

Presence of CNV analysis

The association between clinical outcomes (age at kidney failure, presence of liver cysts, and htTKV) and an additional CNV was tested using regression models for each outcome, correcting for sex and diagnostic variant type i.e., PKD1-T, PKD1-NT, and PKD2. Individuals with a diagnostic PKD1 or PKD2 CNV deletion were coded as a PKD1-T or PKD2 variant, respectively. The presence of a non-diagnostic CNV in a cystic kidney gene was coded as 0 or 1, with 1 being the existence of a quality non-diagnostic CNV call in a cystic kidney gene. R (version 4.1.2) was used to conduct regression model analyses. The appropriate regression model was used depending on the type of data tested: Cox proportional hazards model (coxph()) for age at kidney failure, a logistic regression model (glm()) for presence of liver cysts, and a linear regression model (lm()) for htTKV (in mL/m). Assumptions of the models were tested. The diagnostic variant variable violated the proportional hazards assumption in the Cox model, which was corrected for with strata() in R. Cluster robust standard errors were calculated using coeftest() from the lmtest R package (or cluster() from the survival package for the Cox model), clustering by family ID.

CNV burden analysis

CNV burden analysis focused on the subset of participants with WES (n = 278). CNV burden was calculated as the total length in 100 kb of the CNVs in each respective group, which was calculated for exome-wide, cystic kidney gene regions, and non-cystic kidney gene regions. Cystic kidney gene burden was calculated as the total length of CNVs impacting genes in the PanelApp Cystic kidney disease gene list, and the non-cystic gene burden was calculated by totalling together all other CNVs that did not impact any cystic kidney genes. Statistical significance was determined based on Bonferroni corrected p value < 0.05 (correcting for 3 regions × 3 CNV types = 9 tests, significance threshold p < 0.006) within each series of tests, correcting for diagnostic variant type and sex as previously. Assumptions of the models were tested and corrected for as previously.

Power analysis

Study power was calculated using the powerMediation.VSMc.logistic() function in the powerMediation package in R, with alpha = 0.05. The correlation variable (corr.xm) was calculated by taking the R-squared value from a linear regression testing the burden variable as the outcome, with sex and diagnostic variant as covariates.

CNVRuler region analysis

CNVRuler [36] was used to define CNV regions and to conduct a logistic regression to determine the association between the CNV regions and the clinical outcome. The “CNVR” method was used to group the CNVs into CNV regions. The gain/loss option was selected to determine the association separately between duplications and deletions.

Gene ontology term enrichment analysis

A gene set enrichment analysis was carried out using the Gene Ontology Consortium GO enrichment analysis tool [37] with the GO aspect “biological process” selected. Genes impacted by duplications in non-cystic regions were acquired from the VEP annotations (1599 genes). Genes impacted by any CNV were also acquired similarly (2431 genes). The list of non-cystic genes impacted by duplications was run in the GO enrichment analysis tool against all annotated human genes (20,580 genes, provided by the tool), as well as the genes impacted by any CNV in our cohort.

Results

Clinical description of study cohort

Three hundred and seventy-eight patients met the study inclusion criteria (see “Methods”), of which 371 (across 213 families) passed the CNV quality control criteria (see “Methods”) and were carried forward as the cohort for analysis.

The median age for kidney survival of these 371 patients was 49 years, with 64% of the cohort having reached kidney failure at last follow up. Liver cyst data was available for 90.8% of the cohort, of which 78.6% had imaging-confirmed liver cysts. 86.5% of patients had a genetic diagnosis using standard SNV NGS analysis, with a PKD1 diagnostic variant present in 77.3% of cases. A clinical breakdown of the cohort is provided in Table 1.

Table 1 Clinical description of whole cohort, and comparison between those with and without an additional CNV.

Diagnostic CNVs in PKD1 and PKD2

ClinCNV-called CNVs were first evaluated for pathogenic deletions in PKD1 and PKD2. Seven families were identified with a heterozygous deletion in either gene (Table 2). In these families, twelve affected individuals were positive for a deletion using ClinCNV, and one affected individual was negative. To confirm ClinCNV deletions, MLPA was conducted on at least two affected individuals per family, which confirmed all deletions detected by ClinCNV. The affected individual with a negative ClinCNV result was positive for a diagnostic CNV by MLPA, confirming that all 13 individuals harboured a pathogenic CNV in PKD1 or PKD2. All deletions were classified as likely-pathogenic/pathogenic by ACMG criteria [35]. The identification of CNVs with this workflow increased the diagnostic yield from 86.5% (for SNVs only) to 90.0% (when CNVs were included).

Table 2 Diagnostic CNVs identified in PKD1 or PKD2.

When comparing results from ClinCNV and MLPA, some discrepancies were noted in the exons impacted by the called CNV. For example, in family PED1, ClinCNV called a deletion of exons 27–30 in PKD1. MLPA of this family indicated a smaller deletion, limited to exons 29–30. This is because the MLPA probemix lacked a probe for exon 28, and the drop in sequencing coverage observed in IGV suggests the deletion occurred after the probe for exon 27 (Fig. 1). Similarly, visualisation of the sequencing in IGV for individual PED2-I1 showed that their PKD1 deletion breakpoints can be resolved to exon 15, and just after exon 39 (Supplementary Fig. S1). However, MLPA reported a deletion from exons 14–35, and ClinCNV reported a deletion from exons 16–39 (Table 2).

Fig. 1: Family PED1 with a heterozygous deletion in PKD1 causing ADPKD.
Fig. 1: Family PED1 with a heterozygous deletion in PKD1 causing ADPKD.
Full size image

A Pedigree of Family PED1 indicating individuals with sequencing data. B Visualisation in IGV showing a control, i.e. an unrelated individual with no deletion, and the 3 individuals in PED1 with the PKD1 deletion in exons 27–30. Approximate location of probes from the MLPA probemix are denoted by the red triangles on the exon track. The beginning of the deletion (teal arrows) can be seen by the sudden drop in read coverage within exon 27.

Expanding the search for CNVs in other cystic kidney disease genes

Having established the utility of ClinCNV for identifying pathogenic CNVs in PKD1 and PKD2, calling of CNVs was expanded to other cystic kidney disease associated genes (see “Methods”: Annotation and Filtering). Such CNVs were identified in 21 individuals, all of which harboured a previously identified diagnostic PKD variant in addition to the CNV (Supplementary Table S1). Four patients had two additional CNVs, three of which had a PKD1 duplication detected by ClinCNV and verified by MLPA. The majority (22/25, 88%) of the non-diagnostic CNVs found were duplications. Five non-diagnostic CNVs span the entirety of the impacted gene, all affecting NPHP1. Nearly all additional CNVs were classified as variants of uncertain significance (VUSs) according to ACMG criteria. The exception to this was a deletion in TMEM231, deemed not diagnostic due to its heterozygous genotype, where the gene is disease-causing under an autosomal recessive model [38]. Hereafter we refer to these additional CNVs as “non-diagnostic”.

Validation of CNVs and Estimation of ClinCNV Sensitivity and Specificity

With the identification of CNVs in cystic kidney genes, MLPA and array-CGH were employed to assess the accuracy of ClinCNV calls. First, we assessed the sensitivity and specificity of ClinCNV in detecting CNVs in PKD1 and PKD2 in the cohort. To this end, we MLPA-tested 50 samples; 34 suspected negative and 16 suspected positive, including those with the diagnostic PKD1/PKD2 deletions (see “Diagnostic CNVs in PKD1 and PKD2” above) (Supplementary Table S9). Results indicated a 62.5% sensitivity and 100% specificity of ClinCNV in detecting PKD1 or PKD2 CNVs when considering calls from the combined WES and TGP datasets. WES sensitivity and specificity was 61.90% and 100%, respectively, and TGP sensitivity and specificity was 63.6% and 100% respectively. Discordance between MLPA and ClinCNV was mainly due to PKD1 duplications (8/9 CNVs), which seems to be driving the lower sensitivity. The other discordance was due to a small single-exon PKD2 deletion, which is at the lower limits of detection by read-depth based callers [25]. The sensitivity when assessing only deletions is 90.9%, compared to assessing only duplications which is 42.9%.

To assess the performance of ClinCNV on CNVs larger than 50 kb, 5 samples with qualifying CNVs (see “Methods”) were tested using array-CGH. This analysis suggested 83% (5/6 CNVs) sensitivity of ClinCNV for regions fulfilling the criteria outlined in “Methods” (Supplementary Table S2).

Impact of CNVs in cystic kidney genes on clinical outcomes

To determine the extent to which CNVs act as modifiers of PKD, we applied regression models to investigate the association between CNVs and (1) age at kidney failure, (2) presence or absence of liver cysts, and (3) htTKV. As these regressions required adding the diagnostic variant type (PKD1-T, etc., see “Methods”) as a covariate, only ADPKD individuals were tested, as the ADPKD variant type groups had sufficient numbers for testing. Two types of regression models were conducted: (1) testing the presence of a CNV in a cystic kidney gene as a covariate and (2) testing the burden of CNVs (in 100 kb) as a covariate. Burden was quantified across the exome, stratifying into cystic and non-cystic regions, to determine whether the amount of the exome that the CNVs covered was associated with ADPKD outcomes (see “Methods”).

The presence of a CNV in a cystic kidney gene did not associate with any clinical outcomes (Supplementary Table S3). Duplication burden in cystic genes associated with worse kidney survival (HR = 1.56, 95% CI: 1.26, 1.93, adj-p = 0.0004), with this model having 43.0% power to detect this effect. Additionally, regression models indicated a significant association of exome-wide duplication burden with the presence of liver cysts (OR = 0.82; 95% CI: 0.73–0.91, adj-p = 0.006) (Fig. 2). Stratification into cystic and non-cystic regions showed that the association was driven by the burden of duplications impacting non-cystic genes. Power analysis indicated this model had 95% power to detect an effect of the observed size.

Fig. 2: Forest plots depicting the impact of CNV burden on PKD.
Fig. 2: Forest plots depicting the impact of CNV burden on PKD.
Full size image

The effect sizes for the burden of each CNV type are shown for age at kidney failure (n = 254) (A), htTKV (n = 95) (B), and presence of liver cysts (n = 228) (C). No data is available for cystic deletion burden as there were too few observations with cystic deletion burden to create reliable models. CNV burden is stratified by region (exome-wide, cystic kidney genes only, non-cystic kidney genes) and CNV type (all CNVs, deletions only, duplications only). The adjusted p-value is the value after Bonferroni correction.

The non-cystic duplication burden result prompted us to investigate further if there were certain regions of the genome that were driving the observed effect. We therefore conducted gene set enrichment analysis to determine if there was overrepresentation of certain GO terms, comparing the non-cystic genes impacted by duplications to (1) all annotated human genes (20,580 genes) and (2) the list of genes impacted by any type of CNV in our samples (2431 genes). In both cases, keratinisation (GO:0031424) was the most significantly enriched GO term (p < 0.005) (Supplementary Tables S4 and S5). We then implemented the CNVRuler tool [36] to determine if a specific CNV region was driving this association. Although no associations to liver cysts remained significant after Bonferroni correction, the top region (OR = 0.13; 95% CI: 0.04–0.42, Bonferroni corrected p = 0.052) (Supplementary Table S8) was a duplication that contained exons 5-8 of LRP5L (chr22:25,351,794-25,360,092), a pseudogene of LRP5, a monogenic cause of polycystic liver disease [39]. This region occurred in 7/39 (17.9%) individuals without liver cysts and 6/189 (3.2%) individuals with liver cysts. A nucleotide BLAST indicated the pseudogene LRP5L had 90.5% similarity to 31% of LRP5. According to InterProScan (accession: IPR000033), the hypothetical protein for LRP5L contained low-density lipoprotein receptor class B repeats, which encode for beta-propeller protein domains that are important for the function of its parent gene, LRP5 [39].

Discussion

In this study, we applied ClinCNV to NGS data to call CNVs across cystic kidney genes in a cohort of 371 PKD patients. ClinCNV and subsequent testing by MLPA revealed PKD1 or PKD2 deletions in 13 individuals, increasing our diagnostic yield from 86.5% to 90.0%, with high specificity (100%) but lower sensitivity (62.5%) in comparison with MLPA. Additionally, CNV burden testing indicated an association of duplications in genes unrelated to cystic kidney disease to a lower likelihood of liver cysts, which may in part be driven by a duplication affecting LRP5L.

WES and TGP data are routinely used for molecular diagnosis of PKD and while a causal variant is found for most patients, 13.5% of our cohort remained unsolved when using only SNV analysis. The identification of pathogenic variants is important for management of disease, determining likely prognosis, and aiding in the patient’s family planning [40]. It is known that approximately 1–5% of patients have ADPKD caused by CNVs in key genes such as PKD1 and PKD2, yet the identification of these variants relies on a separate clinical pipeline requiring an expensive MLPA test.

The utilisation of a bioinformatics-based CNV detection tool with existing NGS data allowed the detection of disease-causing CNVs in PKD1 or PKD2. This has both clinical and cost implications, as the NGS data generated for these patients can be used for both SNV analysis as well as CNV detection, confirming the diagnosis for these patients at a similar specificity to MLPA. This contributes to maximising the utility of NGS data generated for each patient and improves efficiency in achieving a molecular diagnosis. The CNV caller struggled with small (<1 exon) deletions, which has been highlighted by other read-depth based CNV callers [41]. This emphasises the importance of having multiple family members with PKD sequenced, as the CNV caller was able to detect other family members who harboured a single-exon deletion. The probes available in any given MLPA probemix should be considered, as highlighted by Fig. 1. MLPA suggests that the deletion occurs in exons 29–30, when visualisation of the sequencing data clearly shows the breakpoint of the CNV occurring in exon 27; this is due to the start position of the deletion occurring after the location of the probe, and the lack of a probe in exon 28. Therefore, CNV detection using NGS data may also be beneficial in cases where sufficient probe coverage is not available in MLPA, or an MLPA kit is not available for the gene of interest. Since the completion of this study, we have discovered an additional 2 patients with diagnostic CNVs in PKD1 or PKD2 in our clinical cohort by utilising this pipeline, demonstrating the applicability of this workflow.

The majority (22/25) of additional non-diagnostic CNVs found in cystic kidney genes were duplications. Dosage sensitivity is a known pathogenic mechanism for specific genes. But for duplications impacting other genes, or where the breakpoints are intragenic, it can be more challenging to infer pathogenicity. Intragenic duplications may disrupt the reading frame of a gene as most occur tandem in direct orientation to the original locus [42]. However, in most cases it is unclear whether a normal allele is preserved, as was the case in this study. As a result, intragenic duplications are classified as pathogenic only if they occur within established haploinsufficiency genes [35]. The results of our regression analysis indicated an association of cystic gene duplications to worse kidney survival. This may suggest that while these duplications are not pathogenic or disease-causing in our patients, they may compound the effect of a diagnostic PKD variant on kidney survival. However, the power analysis suggests we may not have been adequately powered to detect this effect, so replication in a larger cohort will be necessary.

The CNV burden analyses also suggested that duplication burden in the exome is associated with a significantly lower likelihood of the presence of liver cysts, and a GO term enrichment analysis suggested that genes involved in keratinisation may be driving this association. CNVRuler indicated a duplicated region impacting the gene LRP5L associates with the absence of liver cysts, though this does not survive multiple testing. Examples exist of CNV burden associating with disease phenotypes [43, 44], but generally burdens are associated with increased risk of disease. “Protective” CNVs in specific regions have been identified in other diseases, but the exact mechanism of protectiveness is relatively unclear [45, 46]. LRP5L is a pseudogene of LRP5, a key gene involved in polycystic liver disease [39]. LRP5L is predicted to produce a protein at the transcript level according to UniProt, and is not an OMIM disease gene, unlike its parent gene LRP5. LRP5 is involved in the Wnt signalling cascade, which has been shown previously to be involved in keratinisation [47]. Keratins have been noted in their protective effects in the liver, helping hepatocytes cope with stress and injury [48]. LRP5L has been identified in keratinocytes as well [49]. Most strikingly, the parent gene LRP5 has been postulated to be involved in liver cyst development in ADPKD patients previously [50]. The lack of knowledge about LRP5L limits us from drawing any further conclusions; but, speculating, if LRP5L does in fact translate into a protein, and encodes similar functional protein domains to its parent gene, this may explain the effect seen in our results. Replication of these results in a larger cohort is most likely necessary.

In summary, our results demonstrate how WES and TGP data can be used to successfully call CNVs, both diagnostic and non-diagnostic. This has potential clinical and cost impacts, as NGS data performed previously for SNP and indel analysis can be repurposed for CNV detection in PKD patients.