Introduction

Latent BK polyomavirus (BKPyV) infection is highly prevalent among healthy populations with an infection rate as high as 82% [1, 2]. Acute BKPyV nephropathy is an important risk factor for graft loss [3]. In addition, recent evidence has recognized association between BKPyV and urothelial carcinoma, particularly in posttransplantation immunocompromised patients [4]. The integration of BKPyV DNA into human genome and overexpression of large T antigen (LTAg) may play independent or synergistic roles in the development of carcinoma in urinary system after transplantation.

Previous studies have uncovered the carcinogenicity of LTAg expressed by nonintegrated virus such as Simian Virus 40 and BKPyV [5,6,7]. However, little is known about the correlation between BKPyV DNA integration and urothelial carcinoma. Kenan et al. have reported two cases of BKPyV-related cancer after transplantation; and found that the integration occurred in viral protein 1 (VP1) region, which encodes a protein inhibiting LTAg in a feedback loop [8, 9]. Nevertheless, further confirmation of this mechanism and searching for new mechanisms are needed.

In this study, by comparing matching virome capture sequencing results of the primary and metastatic tumors in three cases, we aimed to investigate the role of viral DNA integration in BKPyV-associated oncogenesis and disease progression.

Results

Clinical data and pathological results

All three patients developed high grade and high stage urothelial carcinoma, two in the graft kidney (Cases 1 and 2) and one in the bladder (Cases 3), with metastasis 7, 8, and 12 years after renal transplantation. The clinical presentation of case 1 has been reported in our previous study [10], but in this study, we focused on the consensus integration sites of the both primary and metastatic tumors. By immunohistochemistry, the tumor cells in the primary and metastatic tumors were all positive with antibody again SV40-LTAg, which is highly homologous to the BKPyV LTAg, indicating BKPyV-associated etiology (Fig. 1). The tumor cells from all six samples (one primary tumor and one metastasis in each patient) were subjected to next-generation sequencing.

Fig. 1: Pathological findings of Cases 1–3.
figure 1

Hematoxylin and Eosin (HE) stains of tumor tissue: (A1, A3, B1, B3 and C1, C3). Immunohistochemistry with antibodies against SV40-LTAg: (A2, A4, B2, B4 and C2, C4). Case 1: (A1 and A2) Urothelial carcinoma in allograft kidney; (A3 and A4) Liver metastasis. Case 2: (B1 and B2) Urothelial carcinoma in allograft kidney. (B3 and B4) Nodal metastasis. Case 3: (C1 and C2) Urothelial carcinoma in bladder; (C3 and C4) Nodal metastasis.

Case 1

A 57-year-old woman received a kidney transplant of a donor dying of traffic accident in 2009 due to chronic renal failure caused by chronic glomerulonephritis. The patient had a long-term infection of BKPyV. In April 2014, BKPyV was first detected in her urine with a viral load of 3.31 × 1010 copies/mL. The immunosuppression was reduced then. In November 2016, the patient presented with hematuria accompanying with pain in the right upper quadrant. Space-occupying lesions of the liver were found by ultrasound and computed tomography (CT) scans. In February 2017, the patient again developed painless gross hematuria, accompanying with recurrent low-to-moderate fever and gradually increased serum creatine. The peak serum creatine reached 7.5 mg/dL and plasma BKPyV load was 1.20 × 105 copies/mL. Ultrasound and CT scans showed multiple enlarged lesions in the graft and liver, as well as multiple lymph node metastases. Graft nephrectomy was performed in March 2017 and was pathologically diagnosed as poorly differentiated urothelial carcinoma (Fig. 1A1, A2, A3, A4). After removal of the allograft and immunosuppression, BKPyV in blood or urine turned negative and metastasis regressed spontaneously [10]. The patient is waiting for another renal transplant now.

Case 2

A 61-year-old male received a kidney transplant for end-stage renal disease from autosomal dominant polycystic kidney disease in 2009 at Massachusetts General Hospital. A year later, the patient developed BKPyV-associated nephropathy. Despite decreases in immunosuppression, treatment with leflunomide, and periods of undetectable BKPyV in the blood, his allograft function remained stable with creatinine between 1.9 and 2.3 mg/dL. In 2017, he presented with hematuria, graft pain, pyuria, and an increase in baseline creatinine, from 1.9 to 2.5 mg/dL. CT indicated a large number of invasive masses in the renal pelvis of the allograft. Percutaneous biopsy of the allograft showed that an infiltrating carcinoma with LTAg expression. The patient underwent radical nephroureterectomy of the allograft and lymph node dissection. The pathological diagnosis was poorly differentiated urothelial carcinoma of the renal pelvis with invasion of renal parenchyma and metastasis to multiple lymph nodes (Fig. 1B1, B2, B3, B4). After removal of the allograft and cessation of immunosuppression, the metastasis was further treated and stabilized by radiation and immune check point therapy. The patient is clinically disease free 1 year after surgery.

Case 3

A 76-year-old woman with end-stage renal disease secondary to systemic lupus erythematosus received a kidney transplant in 2006 at Massachusetts General Hospital. In March 2018, the patient presented with a persistent increase in uric acid and serum creatinine (baseline: 2.6 mg/dL). Besides, BKPyV DNA in the urine turned positive, which was negative in 2015. Ultrasound showed hydronephrosis of the transplant kidney and bladder wall thickening. Abdomen and pelvis CT scans showed a 2.4 × 4.2 × 4.3 cm tumor in the right anterior bladder wall. Cystoscopy and biopsy showed five to ten tumors in the bladder. Radical cystectomy was performed after cystoscopic biopsy. Pathological examination of the radical cystectomy showed high-grade invasive papillary urothelial carcinoma infiltrating the uterus with the metastasis to two lymph nodes (T4N2M0 according to the 8th edition of AJCC Cancer Staging Manual) (Fig. 1C1, C2, C3, C4). The patient died of metastasis and metastatic ascites 7 months after surgery.

Next-generation sequencing results of primary and metastatic tumors

We performed both virome capture sequencing and whole genome sequencing (WGS) on each sample. By WGS, the average sequencing depth ranged from 10× to 60×. Two and one integration sites were found in primary and metastatic tumors of Case 1; none and one in Case 2; and none in Case 3. However, the average depth of virome capture sequencing was generally over 1000× with the peak depth of 7853×. Additional (t test, P < 0.05) integration sites were identified, as showed in Table 1. The results indicate significantly stronger capacity of detecting integration sites by virome capture sequencing than WGS (Fig. 2).

Table 1 Viral integration profile of the samples of BKPyV-associated urothelial carcinoma by virome capture sequencing.
Fig. 2: Number of detected integration sites in each sample by WGS and virome capture sequencing.
figure 2

The black columns represent the number of integration sites detected by WGS while the gray columns represent the number of integration sites detected by virome capture Sequencing. Significantly more integration sites were identified by virome capture sequencing (t test, P < 0.05). WGS, whole genome sequencing.

The integration sites identified by virome capture sequencing are summarized in Supplementary Table S1. For control of the sequencing specificity, two separate cases of non-BKPyV-associated bladder urothelial carcinoma (negative LTAg expression by immunohistochemistry), with clinicopathological characteristics similar to Cases 1–3 in respect to age, sex, tumor grade, and stage were sequenced by virome capture sequencing, and no integration site was detected.

Pattern of viral integration in primary and metastatic tumors

In our previous study, we speculated a potential microhomology or nonhomologous end joining integration mechanism in the patient with primary BKPyV-associated urothelial carcinoma [11]. A total number of 332 integration sites were found, with 84% detected with micro-homologous fragments in the flanking region, which is a homology DNA segment shared between human DNA and viral DNA (Fig. 3a), confirming the dominant integration pattern of microhomology-mediated end joining. The mean length of the micro-homologous fragments, which is the main manner of virus genome integration into human chromosome, was 7.2 bp (standard deviation = 3.1 bp). While in other integration sites, insertion of DNA fragments of unknown origin or direct alignment between human and virus gene sequences were found, indicating the nonhomologous end joining as the minor integration mechanism (Table 1; Fig. 3b).

Fig. 3: Patterns of viral integration.
figure 3

a Two viral integration patterns were identified. One is microhomology-mediated end joining, as the dominant mechanism, characterized by a consensus sequence (black letters and red background) of human and virus in the integration site and flanking fragment. The other is nonhomologous end joining, as the minor mechanism, in which either the human DNA sequence (black letters) is directly aligned with virus sequence DNA (red letters) or a foreign sequence (green letters) is inserted at the integration site. The three examples are from sequencing results of metastatic tumor samples of Case 3. b Proportion of the microhomology-mediated end joining integrations in 332 integration sites of the three cases (each has primary and metastatic tumors) with different colors denoting different cases. The radii represent the average length (bp) of the micro-homologous sequence of the microhomology-mediated end joining integration.

Distribution of viral integration sites and affected genes

We found that three BKPyV subtypes (GenBank number: AB217919.1/IVb-1, DQ989813.1/Ib-1, and AB211369.1/Ib-1) were integrated into the human genome in these three patients respectively.

The distribution of breakpoints of viral DNA were significantly nonrandom (χ2 test, P < 0.05) and tended to assemble in large T gene, small T gene and viral protein 2 (VP2) gene (χ2 test, P < 0.05) in the study, rather than evenly distributed according to the length of specific gene region.

However, as to the human genome, integration sites were distributed throughout the whole human genome except Y chromosome, as shown in Fig. 4a, b and there was no significant bias among chromosomes (χ2 test, P > 0.05). Of the 332 breakpoints detected, 153 were located within the host gene region (χ2 test, P < 0.05) with 10 in exons, 133 in introns and 10 in upstream, downstream, 5′ or 3′ untranslated region. The remaining 179 sites located in intergenic regions, mostly surrounding gene regions. Besides, the supporting reads of different integration sites varies from 100 to 105, indicating integration in the subclones of the tumor and tumor heterogeneity in relationship to viral integration.

Fig. 4: Distribution and corresponding supporting reads of the BK poliomavirus (BKPyV) integration sites in three cases.
figure 4

a Human genome (upper semicircle) was divided into 24 regions according to the length of the chromosomes in a counterclockwise order from chromosomes 1 to 22, the X chromosome and the Y chromosome. Each chromosome has a unique color. BKPyV genome (lower semicircle) was tagged with the number of bases from 1 to 5141 bp in a clockwise order as well as gene regions (NCCR was located between Small T gene and Agno gene and was not marked) peripherally. Each integration site was represented by a curve which connects the its location in human genome and viral genome. be Human genome (blue circle) was divided into 23 regions according to the length of the chromosomes in a counterclockwise order from chromosomes 1 to 22 and the X chromosome. The 133 gene regions were sequentially located on the corresponding chromosomes with each bar on the circle denoting the logarithmic value of the supporting reads of the integration sites. For consensus integration sites of the primary and metastatic foci, the supporting reads were added up and then taken the logarithm. b All cases; c Case 1. All genes involved in the primary tumor have consensuses in metastasis in this case; d Case 2; e Case 3.

Consensus integration sites and affected genes

We detected three, one, and one consensus integration sites between the primary and metastatic tumors in Case 1 (LINC01924), Case 2 (eIF3c), and Case 3 (NEIL2), respectively (Table 2). The integration sites were validated by Sanger sequencing. Intriguingly, the supporting reads of the consensus sites in all the cases, whether in the primary foci or metastases were remarkably enriched, much higher than the sum of those of the non-consensus sites in the same samples (Fig. 4c–e).

Table 2 Consensus integration sites between the primary and metastatic tumors.

Discussion

Previous studies investigating the relationship between polyomavirus and cancer mainly focused on the carcinogenic effects of proteins such as LTAg and small T antigen (STAg) expressed by nonintegrated virus [10, 12, 13]. Our study indicated an additional mechanism of BKPyV-associated urothelial carcinoma, besides the carcinogenesis of LTAg.

Compared to WGS, a larger number of viral integration sites were detected by virome capture sequencing, with the improvement in reliability of the sequencing as well. Therefore, we believe that the specific viral DNA capture sequencing, with viral nucleic acid sequences used as probes, is far more effective in obtaining the profile of viral integration than WGS. Besides, the virome capture sequencing technique furtherly facilitates our understanding of the pattern and significance of viral integration, as well as the tumorigenic mechanism of BKPyV.

The number of supporting reads of the integration sites varies, indicating that the viral integration at multiple sites did not occur simultaneously, and rather, occurred continuously, in a multi-staged manner, in the entire process of the cancer initiation, progression, and metastasis.

In this study, we further elucidate the pathologic “driver” integration sites in the primary and metastatic tumors, inspired by “driver” mutation and “passenger” mutation [14]. Similar to “driver” mutation, the integration that significantly affects the expression of host genes, changes the signal pathway, improves the metabolic level of the host cells and promotes the invasiveness of cancer cells is defined as the driver integration [15, 16]. Those that do not are termed “passenger” integration.

In the setting of high viral load, random integration of the virus sequence is sustained in the host cells, which may become immortalized and further proliferate to generate when one driver integration occurred. Afterwards, another integration may occur on the basis of the original driver integration in a small group of cancer cells, which would gain additional advantages in survival and proliferation. Under such circumstance, genetic information of the cancer cells is passed on to the next generation when the cellular proliferation continues. On the other hand, cells with a passenger integration or without any new-onset integration are lysed by the nonintegrated virus, cleared by the host immune system, or gradually eliminated in the process of rapid proliferation of cancer tissues. Consequently, the cell clusters with new integration status continuously replace the old ones. The progress of tumor follows the law of Darwinian evolution, that is, the natural selection. The clinical samples in our study could be regarded as a moment in the process of natural selection. By this, the presence of a large number of different integration sites, the significant cluster of them in gene region, and at least one consensus integration site in the metastases compared to the primary foci could be well explained (Fig. 5). Tumors in the reported three cases could be at different generation but had probably experienced at least one driver integration, which resulted in one consensus integration site of two distant watch points.

Fig. 5: The role of viral integration in multi-staged progression of tumor.
figure 5

Different colors of the cell nucleus denote different integration status of virus. Nuclei with green background and diverse white stripes represent cells with new integration at other sites on the basis of an original integration. Blue nuclei represent normal cells without any viral integration; while the red, purple, black represent cells with different integration sites. The background of the cells with passenger integration or without new integration is blurred; these cells which are at disadvantage will disappear sooner or later depending on their volume. Normal cells with blue nuclei cannot be identified by virome capture sequencing in watch points due to the absence of integration sites.

Due to limited sample size, there is still a lack of strong evidence for high frequency integration gene of BKPyV, like the human papillomavirus integration in chromosome fragile sites [17]. However, two genes, the eukaryotic translation Initiation Factor 3 subunit C (eIF3c) gene and Nei-Like DNA glycosylase 2 (NEIL2) gene, attracted our attention in two cases of this study. Both of the genes were involved in the consensus integration fragments with very high reads in primary and metastatic tumors. eIF3c encodes a gene that regulates the expression of protein and participates in the composition of translation initiation factors [18]. And NEIL2 gene is involved in human DNA repair, which is relevant to the microhomology-mediated BKPyV genome integration [11]. These two genes have also been proved to be highly related to the occurrence and development of prostate cancer, bladder cancer, etc. [19, 20].

The number of supporting reads of the viral integration fragments that related to the three affected genes exceeded the sum of the supporting reads of all the other integration sites in the same sample. This finding indicated that viral integration in these genes existed in most of the cancer cells in the primary and the metastatic tumors, which was regarded as a natural cell cluster with apparent physical and genetic distances to the primary tumor. And it is very likely the driver integration happens in the early stage of the cancer and its impact on the expression of proto-oncogenes and tumor suppressor genes and on the normal cell signal pathways worth further study.

While analyzing the human-virus integration fragments, we unexpectedly found that the viral fragments inserted themselves were not all continuous sequences (Supplementary Fig. S1 and Table S2), with most of them located in noncoding control region (NCCR) and VP1 regions. Both NCCR and VP1 regions are hypervariable regions of BKPyV genome. Among them, the VP1 region encodes the main capsid protein of BKPyV, which is an important antigenic epitope of the virus [21]. The NCCR contains sequences across origin of DNA replication and sequences involved in both early and late gene transcriptional regulation. Besides, the integration-related point mutation, deletion, duplication, and rearrangement in this region result in the significant heterogenicity of different BKPyV subtypes, which is also consistent with previous studies of other polyomavirus like Merkel cell polyomavirus [22, 23]. Muller et al. reported that the deletion of a segment of 17 bp in the NCCR region of the BKPyV subtype numbered AB211371.1 could significantly increase the expression of early BKPyV virus genes such as LTAg and StAg [24]. Kenan et al. held that the low expression of VP1 resulting from incomplete VP1 region inhibited viral replication and increased the expression of LTAg [8, 9]. Therefore, the probable viral DNA rearrangement we found, although limited, is closely related to integration, suggesting the generation of a new BKPyV subtype and mechanism of LTAg overexpression in cancer cells [25].

However, the reason why the integration breakpoints of BKPyV assembled in large T gene, small T gene and VP2 gene needs to be studied. Or the viral integration fragments may have potential biological functions which participate in the natural selection, inspired by hepatitis B virus [26].

To summarize, when the viral load remains in patients with poor immunosurveillance, it is possible that the integration of BKPyV happens to affect the expression of key genes such as eIF3c and NEIL2, which we called driver integration. As a result, the dysregulation of the basic cellular functions such as autophagy, DNA repair and protein expression could have synergistic effect with LTAg and other viral oncoproteins, promoting the process of carcinogenesis [27, 28]. Once the tumor occurred, it is recommended to discontinue the immunosuppressants or remove the graft as soon as possible in order to quickly restore the host immunity. Besides, the heterogenous and numerous integration sites indicated abundant neoantigen production in the tumor cells, providing basis for immune check point-based therapy.

Materials and methods

Patients

The research was approved by the ethical committee of Nanfang Hospital (NFEC-2020-044) and the Institutional Review Board at Massachusetts General Hospital. The study was performed in accordance with the Declaration of Helsinki. Three patients with metastatic urothelial carcinoma after renal transplantation from 2016 to 2019 were included and informed consent was obtained from all subjects. Fixed paraffin-embedded tissues from the primary and metastatic tumor samples were used for DNA extraction and sequencing.

Histopathology and immunohistochemistry

Histological examination and standard diagnostic immunohistochemistry for the characterization of the tumor were performed following routine pathology procedures.

Immunohistochemical detection of polyomavirus-associated antigens was performed on formalin-fixed paraffin-embedded tissue, with mouse monoclonal antibodies directed against the SV40-LTAg (ab16879, Abcam, Cambridge, MA).

Virome capture next-generation sequencing

Genomic DNA was sheared into approximately 200 bp fragments by ultrasonication and constructed into DNA library with kits provided by MyGenostics (MyGenostics Inc, Chongqing, China). BKPyV-targeting single-stranded DNA probes were developed by MyGenostics. Totally 12,400 probes were designed to cover 319 gene regions of BKPyV genomes, with each length of 120 bp. The capture reaction system consisted of the DNA library constructed, 1 000 ng; specific blocking agent BL, 12 μL; viral DNA probe, 5 μL; and hybridization buffer, 19 μL. Enrichment products of the library were collected by magnetic bead method and eluted later. Polymerase chain reaction (PCR) system consisted of captured DNA, 20 μL; 2× PCR mixture, 25 μL; and PCR primer, 5 μL. The reactions were set up as follows: 98 °C, 2 min, 1 cycle; 98 °C, 20 s, 58 °C, 30 s, 72 °C, 20 s, 12 cycles; 72 °C, 5 min, 1 cycle; 4 °C, hold. Ampure XP beads were added into the PCR products to make the DNA fully adsorbed. Qubit dsDNA HS Assay kit was used to quantify the library products, and the length of library fragments was determined by Agilent 2100 Bioanalyzer system (Agilent DNA 1000 Kit). Illumina NovaSeq 6000 system (PE150 strategy) was used in sequencing.

Whole genome sequencing

After the construction of DNA library, biotinylated single-strand DNA capture probe was used in target library hybridization in solution, and target segments was selected by streptavidin-coated magnetic beads. Other steps were the same as the virome capture sequencing.

Bioinformatics analysis

Adaptor, low quality or short sequences (<40 bp) were removed by TrimGalore, and duplicate reads were removed by Picard MarkDuplicates as well [29]. In order to get BKPyV typing information, preprocessed reads were compared to all BKPyV reference genomes with Burrows-wheeler Aligner (BWA) tools [30]. All BKPyV genotypes for references were available in Supplementary Table S3. And the target types were sorted with the standard of an average sequencing depth of more than 10 and over 4× coverage of more than 50%. Based on the typing information, the best one of each comparison was selected as the reference genome of BKPyV integration analysis, and the reads was compared with University of California Santa Cruz (UCSC) human genome hg19 and BWA MEM of BKPyV integration reference genome to get the BAM results then. First, paired-end reads mapping only to UCSC hg19 or BKPyV reference genomes were removed. Next, read pairs in which one of the paired reads was mapped to the BKPyV reference genome and the other to UCSC hg19 were sorted. So were chimeric reads in which one read covered both the BKPyV reference genome sequence and UCSC hg19. Thus, consensus sequences near the integrations site was sorted out, which was of high confidence. SVDetect was used to analyze structural variations [31].