Genomic landscape of adult testicular germ cell tumours in the 100,000 Genomes Project

Ní Leathlobhair, Máire; Frangou, Anna; Kinnersley, Ben; Cornish, Alex J.; Chubb, Daniel; Lakatos, Eszter; Arumugam, Prabhu; Gruber, Andreas J.; Law, Philip; Tapinos, Avraam; Jakobsdottir, G. Maria; Peneva, Iliana; Sahli, Atef; Smyth, Evie M.; Ball, Richard Y.; Sylva, Rushan; Benes, Ksenija; Stark, Dan; Young, Robin J.; Lee, Alexander T. J.; Wolverson, Vincent; Houlston, Richard S.; Sosinsky, Alona; Protheroe, Andrew; Murray, Matthew J.; Wedge, David C.; Verrill, Clare

doi:10.1038/s41467-024-53193-6

Download PDF

Article
Open access
Published: 26 October 2024

Genomic landscape of adult testicular germ cell tumours in the 100,000 Genomes Project

Nature Communications volume 15, Article number: 9247 (2024) Cite this article

7969 Accesses
6 Citations
75 Altmetric
Metrics details

Subjects

Abstract

Testicular germ cell tumours (TGCT), which comprise seminoma and non-seminoma subtypes, are the most common cancers in young men. In this study, we present a comprehensive whole genome sequencing analysis of adult TGCTs. Leveraging samples from participants recruited via the UK National Health Service and data from the Genomics England 100,000 Genomes Project, our results provide an extended description of genomic elements underlying TGCT pathogenesis. This catalogue offers a comprehensive, high-resolution map of copy number alterations, structural variation, and key global genome features, including mutational signatures and analysis of extrachromosomal DNA amplification. This study establishes correlations between genomic alterations and histological diversification, revealing divergent evolutionary trajectories among TGCT subtypes. By reconstructing the chronological order of driver events, we identify a subgroup of adult TGCTs undergoing relatively late whole genome duplication. Additionally, we present evidence that human leukocyte antigen loss is a more prevalent mechanism of immune disruption in seminomas. Collectively, our findings provide valuable insights into the developmental and immune modulatory processes implicated in TGCT pathogenesis and progression.

Transcriptome analysis reveals upregulation of immune response pathways at the invasive tumour front of metastatic seminoma germ cell tumours

Article Open access 12 January 2022

Clonal diversification and histogenesis of malignant germ cell tumours

Article Open access 11 August 2022

T cells in testicular germ cell tumors: new evidence of fundamental contributions by rare subsets

Article Open access 22 April 2024

Introduction

Although testicular germ cell tumours (testicular GCT; TGCT) are rare, they are the most common malignancy in young men, with the highest incidence occurring in those aged 30–34 years¹. The two primary histological types of TGCT are seminoma and non-seminomatous germ cell tumours (NSGCT). NSGCT are typically more aggressive than seminomas, and comprise the histological subtypes embryonal carcinoma (EC), yolk sac tumour (YST), and choriocarcinoma, as well as teratoma²; often multiple histologies can co-exist within a single lesion. Unlike most other cancers, TGCTs are rarely caused solely by somatic driver mutations, but arise from failure to control the latent developmental potential of their cell-of-origin, a foetal germ cell, resulting in its reprogramming³.

There is increasing evidence that the clinical behaviour of cancer and therapeutic response reflects underlying tumour genomics. Thus far, genome sequencing studies of TGCT^4,5,6,7 have mostly been confined to examining the protein-coding (exome) sequence^4,5,8,9 or small whole genome sequenced cohorts¹⁰, and a comprehensive description of the whole genomic landscape of TGCT is largely missing from current literature. Notably, the largest whole genome sequencing (WGS) study of TGCT reported to date was based on the sequencing of only nine postpubertal (age > 12 years) patients¹¹. There has been limited exploration of key mutational processes, such as structural variation, and their signatures in TGCT, and much remains to be explored regarding germ cell tumour evolution. To advance our understanding of TGCT, we examined 60 whole genome sequenced cancers from 57 TGCT patients, recruited from seven National Health Service (NHS) Genomic Medicine Centres across England, sampled as part of the Genomics England (GEL) 100,000 Genomes Project (100kGP) [GEL v12 data release]^12,13. In this work, we provide an extensive analysis of the genomic landscape, mutational processes, and clonal architecture underlying the development of adult TGCT.

Results

Overview of the Genomics England TGCT cohort

All TGCT tumour-normal sample pairs were processed through 100kGP bioinformatic somatic-variant analysis pipelines (tumour coverage: 95–122.7×, mean: 108.7×; normal: 31.3–62.6×, mean: 39.3×). We restricted our analysis to high-quality data derived from fresh frozen material, involving 60 tumour samples from 57 individuals [age 17–77 years (y); median 35 y]; (55 untreated primary and five late-stage treated metastatic TGCT), including four primary tumour regions sampled from a single participant (Fig. 1a, Supplementary Data 1). The primary tumours comprised 39 pure seminomas and 16 NSGCT, including three EC cases and one undifferentiated teratoma. A bimodal age distribution was observed at diagnosis in participants, as expected, with most seminomas being diagnosed between age 20 y and 40 y (Supplementary Fig. 1).

**Fig. 1: Mutational landscape of adult testicular germ cell tumours (TGCT).**

Across the GEL cohort, we identified 80,760 individual single nucleotide variants (SNVs), 7412 small insertions and deletions (indels), and 1865 chromosomal rearrangements (Fig. 1b, Methods). As per previous reports, tumours were typified by a uniformly low rate of single nucleotide variants (SNVs; mean genome-wide substitution rate of 0.475/Mb; range 0.095–1.62), likely reflecting the embryological origins of TGCT^4,5. No tumour displayed a hypermutated phenotype, i.e., excessively high SNV/indel mutation burden (maximum SNV/Mb = 1.62; maximum indel/Mb = 0.28).

Identifying subtypespecific driver mutations

Using the IntOGen pipeline, which aggregates seven complementary driver discovery algorithms, we searched for driver genes across the GEL TGCT cohort (Methods, Supplementary Data 2). Eight genes were significantly somatically mutated (KIT, KRAS, NRAS, RAC1, SPEN, EP300, KLF4, KMT2C). Consistent with The Cancer Genome Atlas (TCGA) study of TGCT⁵, KIT driver mutations defined a subset of seminomas (Fig. 2). Mutations in KIT were clustered primarily in exon 17, in a pattern similar to that previously reported in testicular seminomas and intracranial GCTs (Supplementary Fig. 2, Supplementary Data 2)^14,15. Multiple mutations affecting the same oncogene (KIT) were observed in only one participant with a clinical stage II seminoma. Further analysis identified additional drivers defining distinct subgroups within the seminoma subtype. These encompassed gain-of-function mutations in the transcription factor KLF4 and the GTPase RAC1, as well as loss-of-function mutations in the histone acetyltransferase EP300 (Fig. 2). Additionally, we searched for non-coding drivers using three complementary algorithms, namely OncodriveFML¹⁶, OncodriveCLUSTL¹⁷, and ActiveDriverWGS¹⁸. However, we did not identify any significant non-coding elements under positive selection (Supplementary Fig. 3).

**Fig. 2: Heatmap of molecular mutations in testicular germ cell tumours (TGCT).**

To supplement our analysis of GEL tumours, we reanalysed data from TGCT cohorts within the TCGA⁵ (128 samples) and Memorial Sloan Kettering - Metastatic Events and Tropisms (MSK-MET)¹⁹ (128 samples) studies, allowing us to identify further subtype-specific coding drivers. Within the TCGA dataset, somatic mutations in 10 genes reached significance, including NOTCH1, PIK3CA, BIRC6, ARID1B, and LRP1B. Two putative driver genes, PTMA and FAT4, were identified in NSGCT subtypes and primarily subject to loss-of-function mutations (Supplementary Data 2). GEL cohort data also provided support for PTMA (prothymosin alpha) as a putative driver gene, previously implicated in TGCT, though not currently included in the COSMIC Cancer Gene Census^20,21.

Finally, we assessed the clinical actionability of identified driver gene mutations by referencing the OncoKB Knowledge Base (http://oncokb.org/)²², and found that 17% (19/110) of alterations annotated by OncoKB were targetable (OncoKB Level 1-4). Most targetable mutations (18/19) were Level 3B, indicating predictive biomarkers that are considered standard-of-care for a different tumour type.

Cancer driver genes in focal genomic alterations

The Battenberg algorithm was used to estimate clonal and subclonal copy number variation across the cohort²³. Applying GISTIC2²⁴ to these profiles, we identified 29 genomic regions recurrently affected by focal amplifications and deletions (Methods, Supplementary Fig. 4, Supplementary Data 3). In addition to established recurrent copy number alterations (CNAs), including chromosome arm-level gains spanning KRAS (12p), amplifications involving KIT (4q12; 19% cases) and MDS2 (1p36.32; 17% cases), and deletions spanning DMRT1 (9p24.3; 37% cases), which is associated with testicular germ cell tumour susceptibility²⁵, we identified 26 additional novel events. Although KIT mutations appeared to be restricted to a subset of seminomas, amplifications spanning KIT were also observed in NSGCT (Fig. 2). Segments 1q21.3 (14% cases), 7q11.23 (46% cases), and 22q11.1 (25% cases) spanning oncogenes SETDB1, CDK6, and DGCR8 respectively, were found to be recurrently amplified. Focal deletions spanning cyclin A1 (CCNA1) and the transcription factor FOXO1 (13q13.3), critical for successful spermatogenesis, were found to occur exclusively in seminomas. Notably, focal gains spanning AFP (4q13.2) were also restricted to a subset of seminomas (8/57). Although alpha-fetoprotein is a serum tumour marker typically associated with non-seminomatous germ cell tumours, previous reports have noted elevated serum AFP levels in some cases of histologically pure seminomas^26,27. Several recurrent deletions spanned WNT signalling-related genes including the cadherins CDH1 and CDH11, CREBBP (16q24.2) and SMAD4 (18q22.2). Mutual exclusivity analysis revealed that the most prominent driver events were largely not co-occurring, although the most significant driver interactions identified were cooperating events including PIK3CA-MCL1 amplifications, and RB1-FLI1, RB1-MEN1, and MAF-SMAD4 deletions (Supplementary Fig. 5). Mutually exclusive events were identified involving KIT-MAF and PTMA-MCL1. The sole intra-chromosomal pair identified consisted of co-occurring MEN1-FLI1 deletions.

A primary somatic feature in TGCT development is copy number gain of chromosome 12p, typically structured as an isochromosome (i12p)^7,28. We observed allelic copy number profiles consistent with the presence of at least one i12p in 75% (43/57) of tumours. A subset of these (5/43; 12%) were categorised as canonical chromosomes (Supplementary Methods) but characterised by complex rearrangements of the 12p arm. Complex i12p cases were all seminomas with recurrent focal loss at 11q24.3 encompassing the ETS transcription factor FLI1 (Supplementary Data 3). Most tumours lacking the i12 p event were seminomas (13/14; 93%) and instead had at least four copies of 12p. Only two samples exhibited 12q loss of heterozygosity (LOH), suggesting that most tumours had undergone duplication of chromosome 12 or a second WGD before i12p formation, as previously described²⁹.

Hotspots of structural variation in TGCT

Using methods described by Glodzik et al.³⁰, we identified a single structural variant hotspot involving large (>100 kilobases, kb) tandem duplications (TD) and eight deletion hotspots (Supplementary Data 4). We observed one TD hotspot in the region of chr19:55–58 Mb spanning the histone methyltransferase, GLP. Interestingly, a gain-of-function mutation in the Caenorhabditis elegans Notch receptor glp-1 has been described, leading to germline tumour formation³¹. However, this hotspot did not overlap with any GISTIC-defined focal amplifications. Deletion hotspots associated with copy number loss were centred on chr3:60 Mb spanning the fragile histidine triad (FHIT) gene, chr9:7-12 Mb covering the tyrosine phosphatase PTPRD, and chr16: 78–84 Mb targeting cadherin 13 (CDH13). We observed chromothripsis in one tumour (GEL-TGCT-0056), a rare case of metastatic teratoma with somatic-type malignancy, in which a cluster of 23 structural variants arose in a single catastrophic event affecting chromosomes 7 and 17, including amplification of PPM1D (Supplementary Data 4). Canonical translocations and fusions associated with Ewing’s sarcoma and related primitive neuroectodermal tumours were not detected in this participant.

KRAS amplification on extrachromosomal DNA

We next leveraged the GEL dataset to explore the landscape of extrachromosomal DNA (ecDNA) formation in testicular cancer. EcDNA is often associated with oncogene amplification and poor clinical outcomes in many cancers³². Amplicon structures were detected and classified in TGCT using the Amplicon Architect tool³³. Amplicons were identified in 85% (46/54) of the TGCT samples (Methods, Supplementary Data 5). The size of single-interval amplicons detected ranged from 116 kb to 76 Mb (median 4 Mb), and over 85% (113/130) were >1 Mb. Complex rearrangements identified in at least two samples spanned the established TGCT oncogenes KRAS, MYC, EGFR, and members of signalling pathways commonly dysregulated in cancers including WNT (SOX2), RTK (PDGFRA), and the p53 pathway inhibitors, CDK4/6 and MDM2 (Supplementary Data 5). The only oncogene identified within cyclic amplicon structures, including ecDNA in one instance, was KRAS, and only in seminomas. Amplicons showing a signature³⁴ of having been created by a breakage-fusion-bridge mechanism were also exclusively identified in seminomas. Seminomas also carry a significantly higher number of amplicon structures relative to NSGCT (p = 0.018; Supplementary Fig. 6).

Complete repertoire of mutational signatures

To gain insight into the aetiological basis of mutation, we extracted mutational signatures (Supplementary Figs. 7–8, Supplementary Data 6). In most tumours, the majority of single base substitutions (SBS) could be assigned to signatures SBS5/SBS40 and SBS1 (using nomenclature established in ref.³⁵), thought to result from endogenous clock-like mutagenic processes (Supplementary Fig. 8); however only SBS5 and the number of C > T mutations at NpCpG trinucleotides correlated with age (p = 4.3 × 10⁻⁸ and p = 0.02, respectively; Supplementary Fig. 9). Seminomas with mutant KIT had significantly lower SBS1 than either wild-type seminomas (p = 0.0028) or NSGCT (p = 5.5 × 10⁻⁶).

Some TGCT subtypes exhibited distinct SBS patterns. SBS18, a signature linked with damage by reactive oxygen species (ROS), was detected in two tumours, both NSGCT with minor YST components. Notably, in GEL-TGCT-0038, the majority of variants were attributable to SBS18. This signature has previously been described in multiple paediatric cancers, placental tissue, and most recently in patients with pre- and peripubertal YSTs^11,36. A signature attributable to platinum chemotherapy exposure, SBS35, was detected in two post-chemotherapy metastases, as expected. SBS31, another signature related to platinum drug treatment, was also found in a clinical stage I primary seminoma treated with radical orchiectomy and carboplatin after sampling. SBS32, a signature not reported in prior TGCT studies, and associated with azathioprine treatment³⁷, was detected in 11% (6/57) of participants, despite no documented medical history indicating that any of these participants had received such treatment. Of note, a similar finding was recently reported in acute myeloid leukaemia patients, implying mutational mechanisms other than exposure to azathioprine may contribute to SBS32³⁸. Changes in mutational signature activity between clonal and subclonal mutations were observed (Supplementary Fig. 10), with a general trend towards a lower proportion of subclonal mutations attributed to SBS5 (p = 2.2 × 10⁻¹⁶; test for trend in proportions). Analysis of the indel (ID) mutational spectra revealed a predominance of ID1 and ID2, both due to slippage during DNA replication³⁵. Deletion patterns characterised by ID6 and ID8 and arising from distinct mechanisms of DNA double-strand break repair³⁵ were mutually exclusive (Supplementary Data 6). The majority of doublet base signatures (DBS) identified in TGCT were of unknown aetiology, except for those associated with tobacco smoking (DBS2) and platinum chemotherapy (DBS5).

We next examined mutational processes generating genomic rearrangements in TGCT. To detect these, we first applied a recently developed framework³⁹ for classifying chromosomal instability in cancer from 21 pan-cancer copy number signatures (CN1-CN21) (Supplementary Figs. 11–12). The tetraploidy-associated signature CN2 was found in most samples, across both seminomas and NSGCT. We also identified an attribution of both CN1 and CN2 signatures together across a number of tumours, indicating a hyperdiploid or sub-tetraploid profile³⁹. We identified contributions from CN13-CN15, a family of signatures characteristic of specific numerical chromosomal instability, encompassing whole-arm or whole-chromosome-scale loss of heterozygosity events. CN13, which is dominated by LOH segments of total copy number 1, was restricted to NSGCT. Co-occurrence of signatures CN1, CN13, and CN15 was observed in a small number of participants (3/57; 5%) with copy number profiles showing significant amounts of copy-neutral LOH and only in metastatic samples or, notably, primary cases that reported subsequent metastases, suggesting potential clinical relevance for this signature in TGCT.

Next, we classified structural rearrangements in subclasses considering their type and size (Methods), applying the same statistical framework used for other classes of mutational signatures⁴⁰. This approach revealed two structural variant signatures (S1, S2) (Supplementary Figs. 13–14), present in both seminomas and NSGCT. Signatures S1 and S2 were similar to recently described rearrangement reference signatures characterised by unclustered translocations (RefSig R2) and unclustered deletions up to 100 kb (RefSig R5), respectively (Supplementary Fig. 15)⁴⁰. Previously described associations include RefSig R5 with BRCA2 mutations and RefSig R2 with driver mutations in TP53⁴⁰. Although BRCA2 mutations were not detected in the GEL cohort, tumours exhibiting recurrent deletions spanning BRCA2 displayed a significantly higher prevalence of signature S2 rearrangements (p = 0.001528, Wilcoxon rank sum test). Other signatures associated with inefficient homologous recombination repair are either not detected in the GEL TGCT cohort (SBS3) or are present in a small number of cases (ID6/ID8). Thus, it is not clear that loss of BRCA2 contributes to the overall signature repertoire.

Prevalence of whole genome duplication

Whole genome duplications (WGD) are near universal in TGCT, with recent work showing these events occur early in embryogenesis^5,11. In all but one case (56/57), tumours from the GEL cohort were shown to have undergone WGD (Supplementary Fig. 16). Using MutationTimeR⁴¹, we timed somatic mutations relative to copy number gains and calculated the relative timing of these gains. We then timed the occurrence of WGD, using the ratio of clock-like mutations occurring before and after WGD (Methods). We observed a median of ~ 9 substitutions (range 0-375) occurring prior to WGD, and in seven cases we did not observe any pre-WGD substitutions, supporting early occurrence of genome duplication, likely in utero (Fig. 3a). This observation is in stark contrast to most solid cancers, where WGD events are broadly distributed throughout clonal evolution and likely stochastic (Fig. 3b). However, in three cases, genome doubling events were estimated to occur much later relative to the rest of the cohort. One of these samples, an extensively metastatic GCT with a predominant EC component, carried an estimated 375 pre-duplication substitutions. A further two cases, both clinical stage I seminomas, also exhibited relatively late WGD. Both were metachronous bilateral testicular tumours; one participant had their first TGCT diagnosis almost 30 years before 100kGP sample collection, and the other was diagnosed for a second time five years after sampling. In the metastatic case, there was a past history of a bilateral retractile testis but with no previous report of bilateral TGCT. Recent single cell analyses suggest that neonates possess a small pool of gonadal cells with characteristics of primordial germ cells (PGCs) in their testes⁴². It is therefore conceivable that PGC-like cells lingering into infancy could undergo the same WGD process.

**Fig. 3: Timing of whole genome duplication (WGD) events across Pan-Cancer Analysis of Whole Genomes (PCAWG) and Genomics England (GEL) testicular germ cell tumour (TGCT) cohorts.**

We then estimated the time point during PGC development that WGD occurred by dividing pre-duplication substitution burden estimates by the reported mutation rate per cell division within PGCs⁴³, as described in Oliver et al.¹¹ (Supplementary Methods). Excluding the late WGD cases, median WGD was estimated to occur at ~11 cell divisions in TGCT (range 0–71.5, lower and upper bounds of post-PGC cell divisions), setting the genetic hallmark of TGCT initiation in the developmental period. Most tumours with WGD (42/56; 75%) had synchronous chromosomal gains (Supplementary Methods, Fig. 3c), broadly in line with the distribution of gain patterns reported by the Pan-Cancer Analysis of Whole Genomes (PCAWG) in tumours with WGD⁴¹. A subset of tumours (12/56; 21%) that had undergone genome duplication evidenced asynchronous gains; asynchronous gains were only observed in pure seminomas or NSGCT with a predominant EC or seminoma component, suggesting divergent patterns of chromosomal evolution underlying histogenesis (Fig. 3c). Moreover, the proportion of CNAs attributed to signature CN2 was significantly higher in samples with synchronous gain patterns (P = 0.029, Wilcoxon rank sum test), while the proportion of CNAs attributed to CN14 was higher (P = 0.006, Wilcoxon rank sum test) in asynchronous genomes.

Relative timing of genome doubling and driver mutations in TGCT

Using a permutation approach, we identified CNAs with evidence for significant enrichment or depletion across the GEL TGCT cohort and in the seminoma and NSGCT subgroups (Methods, Supplementary Data 7). A probabilistic timing model was used to reconstruct the order of acquisition of recurrent genomic aberrations, including WGD, enriched CNAs, and putative driver mutations across all TGCT genomes and within each of seminomas and NSGCT (Methods, Fig. 4). Enriched gains spanned known cancer and TGCT drivers including MYC (8q11-q24), EGFR (7p11.2), and BRAF (7q34). Similarly, enriched LOH events covered tumour suppressor genes such as APC (5q22.2), ATM (11q22.3), and CDX2 (13q12.2). No evidence was found for enriched homozygous deletion events. We further identified CNAs with evidence for significant negative enrichment in TGCT, implying that these events are less important for, or perhaps incompatible with, driving tumourigenesis in TGCT or in the context of widespread WGD (Supplementary Data 7).

**Fig. 4: Probabilistic ordering reveals most likely timing of copy number and driver events in TGCT.**

In line with our analysis of WGD developmental timing, tetraploidisation was consistently the earliest event seen, followed by 12p gains spanning the KRAS locus, which may imply an initiating tumourigenic role in adult TGCT. To more accurately estimate the timing of high-level copy number gains specific to chromosome 12, we used AmplificationTimeR⁴⁴, a method for timing individual amplification events (Supplementary Methods, Supplementary Data 7). Within most samples analysed, findings were in keeping with early timing of whole genome doubling. However, there is evidence to suggest that in some participants, chromosome 12 gains instead represent the earliest occurring events in the evolutionary history of the tumour, occurring pre-WGD (Supplementary Fig. 17).

Most of the early events following WGD were gains and showed balanced representation across TGCTs (Fig. 3b), although later enriched events were specific to TGCT subtypes, such as the 12q11 gain spanning KIF21A and restricted to NSGCT. The only CNA event uniquely enriched in seminomas was a recurrent LOH spanning BRCA2 (chr13:18–114 Mb). Whilst not statistically significant, most individuals (12/14; 86%) harbouring this event belonged to the ‘young-onset’ group (<40 years; Supplementary Methods). Additionally, a subset of enriched CNAs were specific to young-onset seminomas, such as chr8:45–129 Mb and chr7:60–159 Mb (Fig. 4c).

A Dirichlet Process clustering algorithm was used to cluster SNVs and indels according to their cancer cell fraction⁴⁵. There was no significant variation in the proportion of SNVs, indels, or CNAs identified as subclonal across participants according to tumour stage or subtype (Supplementary Fig. 18a, b; Supplementary Data 7). Multi-site clonality analysis of four whole-genome sequenced regions from one participant (GEL-TGCT-0058; pure seminoma) point towards limited intra-patient tumour heterogeneity (Supplementary Fig. 18c). Across all subtypes, mutations in driver genes including KIT, KRAS, and NRAS were relatively late events, occurring post-WGD and after corresponding copy number gains or other CNAs (Fig. 4b, c). Participants with KRAS and KMT2C drivers typically had a higher age at diagnosis.

HLA loss enriched in seminomas

None of the 60 tumours harboured nonsynonymous mutations in human leucocyte antigen (HLA) genes (Methods). However, LOH at the HLA locus, where either the maternal or paternal allele is lost, was identified in six tumours using the LOHHLA algorithm⁴⁶ (Supplementary Data 8). HLA LOH affected a single type-I gene in two seminoma cases, and the HLA-A and -C genes in another three. HLA LOH potentially affected all three HLA genes in a single case, which was the only NSGCT case affected. In GEL-TGCT-0053, a post-chemotherapy lymph node metastasis, LOH was detected in both HLA-A and HLA-B, and although LOH could not be established in this case where two highly similar HLA-C haplotypes were observed (C07:02 and C07:01), the ordering of HLA genes suggested it was likely that there was also loss of HLA-C. We found no significant associations between HLA homozygosity (Methods) and either age of diagnosis, clinical stage, or pathological stage.

Allelic imbalance without LOH, i.e., HLA imbalance as a result of unequal copy gain at the HLA locus or LOH not reaching statistical significance, was observed in a further 17 cases and in both seminomas and NSGCT. The majority of cases where HLA imbalance was observed were in seminomas (9/17; 53%). In most other tumours with HLA imbalance, the major histological component was EC (6/17; 35%), suggesting subtype-specific mechanisms of immune disruption in TGCT. HLA or B2M mutations, which can disrupt neoantigen-MHC binding, were not observed (Methods). Scanning for somatic mutations in genes involved in antigen presentation and processing (Methods), we found one seminoma exhibiting HLA LOH had also acquired a mutation in the proteasome regulator PSME4, which plays a key role in immunoproteasome activity and generating immunopeptidome diversity. Collectively, these findings suggest that while HLA mutations are unlikely to be a major mechanism of immune evasion in TGCT overall, HLA LOH could represent a mechanism of immune disruption and/or escape, primarily in a subset of seminomas, though further study is required.

TGCT samples had a median of 10 neoantigenic mutations, mostly arising from SNVs (Supplementary Data 8). We found no significant difference in neoantigen burden between samples with HLA LOH, HLA imbalance, or an intact HLA locus. We next explored tumours with HLA LOH and evaluated whether the LOH event affected their neoantigen landscape (Supplementary Fig. 19). To do this, we computed the number of antigenic peptides predicted to bind the allele lost in the LOH event and compared it with the number of peptides binding to the retained allele. We found no significant difference overall, but observed a trend in 3/6 samples (GEL-TGCT-0007, GEL-TGCT-0018, GEL-TGCT-0050) for a higher number of binders associated with the lost, rather than the retained, allele. In 2/6 samples we observed the opposite trend, though the difference between lost/kept-associated binders was less striking. These observations could suggest that in some seminomas, HLA LOH provides functional escape from immune selection pressure, whereas in the other samples, immune selection is negligible, or HLA LOH is secondary to other non-genetic escape mechanisms.

Discussion

To our knowledge, this analysis provides the largest study of the whole genome landscape of adult TGCT. Our study extends the number of such whole genomes reported by almost tenfold, further defines the genomic basis of molecular subtypes, and charts the typical evolutionary trajectories of tumours.

Supporting evidence has been provided for 17 candidate driver genes, including subtype-specific drivers. Furthermore, we identified a putative loss-of-function driver mutation in PTMA, a gene not currently catalogued by OncoKB or the COSMIC Cancer Gene Census. Previous work has suggested that a homologue of PTMA, PTMS (parathymosin), may be implicated in GCT epigenetic remodelling¹¹. It is likely that putative drivers identified in only one TGCT cohort, such as KLF4 in GEL or FAT4 in TCGA, represent rare or low frequency driver events. A recent study of childhood and adolescent GCTs reported that FAT family genes were mutated exclusively in non-seminoma subtypes⁹. Recurrent deletions and structural variant hotspots involving cadherin genes detected in this study support a potentially important role for these proteins in TGCT pathogenesis and progression. Sertoli-germ cell adhesion is crucial for spermatogenesis and cadherin proteins are important mediators of cell-cell adhesion in the testes⁴⁷.

We observed a lower burden of SBS1 relative to SBS5 across all participants, which could be due to the age distribution of the cohort⁴⁸, with the nominally clocklike SBS1 signature entirely absent in some cases. Recent observation supports the hypothesis that a lower SBS1 burden in normal seminiferous tubules could be due to a reduced rate of spermatogonial stem cell division compared with somatic stem cells, with SBS1 and SBS5 rates of generation independently regulated⁴⁹. It has previously been postulated that SBS1 mutations may be generated during DNA replication, at the time of mitosis³⁵. The depleted contribution of SBS1 in KIT-mutated seminomas could signal that these cells are, or were, maintained in prolonged mitotic arrest.

In addition, we identified six sporadic SBS signatures. Interestingly, mutational signature SBS32, linked to chronic exposure to azathioprine, was detected in ~10% of participants, without any such recorded treatment history, indicating that other exposures may generate SBS32 mutations. The foetal origins of malignant GCTs raises the possibility that such exposure might even arise in utero^50,51. One further possibility is that exposures at different ages or individual differences in susceptibility to mutational insult could also underlie the bimodal diagnosis age distribution observed in seminomas. The teratoma GCT subtype represents a terminally differentiated tissue which is typically non-responsive to chemotherapy⁵². The marked absence of signatures SBS31 and SBS35, linked with platinum compound exposure, in post-chemotherapy teratomas reported in our study suggests that this might be due, at least in part, to the ability of non-differentiated cells to withstand the specific mutational damage normally associated with chemotherapy exposure. We report signature SBS18 in adult TGCT, and exclusively in NSGCT, perhaps corresponding to damage from intrinsic ROS mechanisms⁵³ initially induced during development. Our analysis of mutational signature evolution suggests such processes remain active during the development and progression of NSGCT subtypes. Importantly, a substantial proportion of the signatures detected in TGCT remain of unknown aetiology, highlighting the contribution of mutational processes yet to be identified.

In seminomas, low levels of tumour-infiltrating lymphocytes (TILs) have been linked to poorer patient outcomes, namely a higher clinical stage at presentation and increased relapse rates⁵⁴. Our finding that HLA LOH is almost exclusive to seminomas, and the possibility that a reduced set of antigenic peptides is presented to the immune system as a result⁴⁶, suggests a potential genomic mechanism underpinning the low-TILs seminoma subgroup. Additionally, a recent study reported a prevalence of HLA-I LOH of 16.7% in germ cell tumours, which is largely consistent with our results⁵⁵.

The chronological ordering of genomic events in TGCT corresponds with the initiation of most tumours within the gonadal developmental pathway. Consistent with the canonical model of TGCT, WGD occurs at the earliest stage, likely arising from erroneous centromeric division during mitotic anaphase, and most often precedes 12p gains⁵⁶. While other fundamental biological abnormalities of TGCT are evident in germ cell neoplasia in situ (GCNIS)⁵⁷, whether occurrence of WGD is as frequent in these precursor lesions, or even in normal germ cells devoid of any clinical manifestation, remains to be determined. Probabilistic ordering also supports the idea that chromosomal gains and losses following early tetraploidisation of tumour cells are non-random, with specific events being favoured or suppressed during typical TGCT development. Recent analyses of WGD in ovarian adenocarcinoma suggest that tetraploidisation, though often arising early in clonal evolution, can occur throughout the female reproductive lifespan⁴¹. However, in male reproductive tissues it appears that such events may be constrained to early life. The less common relatively late-WGD tumours identified here highlight rare aetiologies of TGCT that require further exploration in larger cohorts.

Limitations of the present study include the relatively small sample size and the clinical homogeneity of the cohort, which is enriched for early stage seminomas, leaving small numbers of patients for detailed subgroup analyses. Large targeted studies will add further power to analyse rarer TGCT subtypes, more aggressive forms of disease, and individuals with poorer survival outcomes. A further limitation is that we have only considered a single data modality (DNA). Although we established putative drivers in TGCT pathogenesis, confirmation of our findings in complementary experimental work would increase confidence in their reliability. Analysis of multi-modal data (e.g., RNA, protein, and DNA accessibility) is essential for an improved comprehension of the molecular underpinnings of TGCT initiation and progression. In addition, our analysis did not consider the potential pathogenicity of germline variants. Future studies should address these gaps. Despite these limitations, our study sheds light on the diversity of genomic processes driving TGCT oncogenesis and progression and highlights important genomic alterations that could facilitate immune evasion in specific TGCT subtypes.

Methods

Participant recruitment and consent

This study was made possible through access to data in the National Genomic Research Library, which is managed by Genomics England Limited (a wholly owned company of the Department of Health and Social Care). Genomics England has approval from the Health Research Authority Committee East of England – Cambridge South (REC Ref 14/EE/1112). Additional ethical oversight is provided by Genomics England’s Ethics Advisory Committee and Participant Panel. Detailed consent for participation in the 100,000 Genomes Project was obtained for all participants in line with cancer-programme specific guidance from Genomics England (https://files.genomicsengland.co.uk/forms/Cancer-Model-3.2.0.docx). Participants recruited to the study included anyone with a likely diagnosis of a testicular tumour, regardless of sex or gender, in accordance with SAGER guidelines. The sex of participants was reported by submitting clinical teams and was not determined by the research team.

Tissue collection and handling

In the 100,000 Genome Project, tissue is collected then processed at UK National Health Service (NHS) Genomic Medicine Centre hubs (GMCs). Blood germline and fresh frozen tumour biopsy samples were collected and processed according to the specifications outlined in the Genomics England Sample Handling Guidance Documentation Version 4.0 (https://files.genomicsengland.co.uk/forms/Sample-Handling-Guidance-v4.0.pdf). Formalin-fixed paraffin-embedded tumour samples were excluded from the current study.

Sample selection

Sixty whole genome sequenced tumours were analysed in this study, including a set of four primary tumour regions sampled from a single individual. For the identification of driver mutations, recurrent copy number alterations, and amplicons, we used 54 out of the 57 individual samples. Specifically, we selected the highest purity tumour from the four multi-region samples (see Supplementary Data 1). Additionally, we excluded genomes that were not generated using PCR-free library preparation (2 out of 57), as well as one metastatic case that had undergone malignant somatic transformation (GEL-TGCT-0056; see Supplementary Data 1). For more details, please refer to Supplementary Information.

Clinical data

Clinical data was collected from NHS GMCs via the central Genomics England team in line with Genomics England Cancer Model data submission and sample tracking guidance.

Version 3.2.0 (https://files.genomicsengland.co.uk/forms/Cancer-Model-Sample-Tracking-3.2.0.docx).

Variant calling and filtering

Detection of germline and somatic single nucleotide variants (SNV) and insertions/deletions (indels) < 50 bp was performed using Strelka⁵⁸ (version 2.9.9). Alongside default Strelka filters, we applied the following additional filters to remove variants including⁵⁹:

Variants with population germline allele frequency ≥1% in the 100kGP or gnomAD datasets.
Variants with excessive somatic frequency (≥5%) in the 100kGP cancer dataset. The 5% threshold was based on the frequency of recurrent non-synonymous variants in hallmark genes in the Cancer Gene Census⁶⁰.
Variants identified as simple repeats by Tandem Repeats Finder⁶¹.
Indels where ≥10% of base calls within a 50-base window on either side of the indel were flagged and filtered by Strelka due to high sequencing noise.
Variants called in regions of poor mappability where the majority of overlapping 150 bp reads do not map uniquely to the variant position.
Variants resulting from systematic mapping and calling artefacts present in both tumour and normal 100kGP sample sets. Specifically, somatic SNVs were identified where the ratio of tumour allele depths differed significantly from the ratio of allele depths at the same site in a panel of normal samples (PoN), as tested using Fisher’s exact test. The PoN comprised 7000 non-tumour genomes from the GEL dataset. Only individuals not carrying the relevant alternate allele at a particular site were used to count allele depths in the PoN. To ensure similarity to the Strelka preset filters, duplicate reads were removed and quality thresholds set at base quality ≥ 5, mapping quality ≥ 5, and phred score < 80.

SNVs and small indels were normalized (left aligned, trimmed, multi-allelic variants decomposed) and annotated using Cellbase with GRCh38 Ensembl v90, COSMIC⁶⁰ (version v86/GRCh38) and ClinVar⁶² (October 2018 release) databases. Variant consequences were annotated using a high-performance tool within Cellbase, and only variants associated with a set of curated consequence types (e.g., stop gained/lost, start lost, frameshift, inframe insertion/deletion, missense, splice acceptor/donor, and splice region variants) in canonical transcripts were reported.

Driver identification

Cancer driver genes were identified using the IntOGen pipeline⁶³. The relative evolutionary timings of candidate driver mutations were obtained using MutationTimeR⁴¹.

CNA analysis

A Nextflow pipeline, in combination with the CleanCNA R package (https://github.com/afrangou/CleanCNA), performed the following steps. Battenberg v.2.2.7 (https://github.com/Wedge-lab/battenberg) was run across all samples and the resulting genome-wide copy number profiles were then input into two algorithms, DPClust⁴⁵ (version 2.2.5; https://github.com/Wedge-lab/dpclust) and CNAqc (https://github.com/caravagnalab/CNAqc), in conjunction with Variant Allele Frequency (VAF) information from somatic SNVs. DPClust was used to calculate the Cancer Cell Fraction (CCF) of individual mutations, and cluster mutations based on their CCF. CNAqc compared the expected peaks of SNV VAF distributions with the observed peaks for an individual sample, in a set of 5 copy number states [1:0,1:1,2:0,2:1,2:2]. Metrics from the Battenberg profile, DPClust output, and CNAqc output, were combined in order to provide an overall assessment of the CNV profile. Extrachromosomal DNA (ecDNA) molecules were detected from tumour BAM files using Amplicon Architect (v1.2)³³. Recurrent arm-level copy number events, as well as focal amplifications and deletions, were identified using the GISTIC²⁴ algorithm (v2.0.2.3; https://github.com/broadinstitute/gistic2).

Classifying whole genome duplication events

We defined samples that had undergone WGD based on tumour ploidy and the extent of loss of heterozygosity. Individual tumours (including only the highest tumour purity sample from multi-region sampling) were plotted based on average ploidy and fraction of genome with LOH (Supplementary Fig. 16). The separating line between WGD and non-WGD tumours was estimated according to the approach established by Dentro et al.⁶⁴ as y = 2.9 − 2x. Most samples with WGD evidenced synchronous chromosomal gains⁴¹, further validating this approach. To evaluate this, we used the same approach taken by the PCAWG Consortium⁶⁴.

Structural variant (SV) analysis

Somatic rearrangements were identified using a graph-based consensus approach comprising Delly⁶⁵ (version 0.7.8), Lumpy⁶⁶ (version 0.2.13), and Manta⁶⁷ (version 0.28.0), whilst also considering support from CNAs. Rearrangements were first called using the three individual callers with default parameters. Delly was run with post-filtering of somatic SVs using all normal samples. Rearrangements from the three individual callers were further filtered if any reads supporting the variant were identified in the matched normal, if < 2% of tumour reads supported the variant, or if either variant breakpoint was located in a telomeric or centromeric region or on a non-standard reference contig (i.e. not chromosomes 1–22, X or Y). Remaining rearrangements were merged with a modified version of the PCAWG SV Merge tool, which uses a graph-based approach to identify and merge rearrangements identified by multiple callers, allowing 400 bp slop for breakpoint positions⁶⁸. Rearrangements were included in the final data set if they were identified by at least two callers, or by a single caller but with a breakpoint within 3 kb of a CNA segment boundary.

Mutational signatures analysis

Mutational signatures were extracted from SBS, DBS, ID, copy number alterations and structural variants using SigProfilerExtractor⁶⁹ (version 1.1.3). All signature extraction runs were performed using random initialization (nmf_init parameter), 500 NMF replicate runs (nmf_replicates parameter) with 10,000 (min_nmf_iterations parameter) to 1,000,000 NMF iterations (max_nmf_iterations parameter). For SBS signatures, we assumed the presence of 1–30 signatures, for DBS signatures we assumed 1–20, for ID signatures 1–15, for CN signatures 1–30, and for SV signatures 1–15 signatures. In downstream analysis, only DBS signatures with > 4 mutations were retained. The optimal number of de novo signatures was determined by aiming to maximize the mean sample cosine distance while ensuring that the average stability remained above 0.9.

Linear regression models were fit to test the association between patients’ age and the number of mutations attributed to each signature. Assignment of SNVs to SBS mutational signatures was carried out for all mutations from each sample. Briefly, trinucleotide contexts were obtained for all SNVs processed by DPClust, where each SNV has been assigned to a cluster with CCF values allowing for clonal tree reconstruction. For clonal and subclonal clusters, probabilities per signature were assigned by multiplying the proportion of mutations of a specific mutation type assigned to that cluster against the corresponding decomposed mutational probability assigned by SigProfilerExtractor, and summing across these values.

Kruskal-Wallis tests were used to compare structural variant signature activities among samples, grouped by tumour type (primary versus metastatic), broad disease subtypes, and detailed disease subtypes (see Supplementary Data 1).

HLA LOH calling with LOHHLA

Somatic mutations in the HLA locus were predicted using Polysolver⁷⁰. Loss of heterozygosity at the HLA locus was predicted using LOHHLA⁴⁶. A type-I allele of a sample was annotated as “allelic imbalance” (AI) if the p-value testing the difference in evidence for the two alleles was < 0.01. Alleles with AI were further labelled as LOH if the following criteria held: (i) the predicted copy number of the lost allele was below 0.5 with confidence interval strictly below 0.7; (ii) the copy number of the kept allele was above 0.75; (iii) the number of mismatched sites between alleles was above 10. Samples with at least one HLA gene showing LOH were labelled as “HLA LOH”, and samples with no LOH but at least one gene showing AI as “HLA imbalance”. Note that HLA imbalance could be a result of unequal copy gain in the HLA locus (e.g. CN = 2:1), but could also indicate LOH that does not reach statistical significance, e.g. because of low sample purity or because it is subclonal in the tumour.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

Summary information for each sample is provided in the Supplementary Data, ensuring that data do not enable the identification of participants. Primary data from the 100,000 Genomes Project are held in a secure Research Environment and are available to registered users. To become a member of the Genomics England research network and obtain access, please visit https://www.genomicsengland.co.uk/research/academic/join-gecip. The process involves an online application, verification by the applicant’s institution, completion of a short information governance training course, and final approval by Genomics England. More information is available at https://www.genomicsengland.co.uk/research/academic. The Genomics England data access agreement can be found at https://doi.org/10.6084/m9.figshare.4530893.v7 (ref. ⁷¹). WGS data and processed files from this project can be accessed by joining the Pan-Cancer and Molecular Oncology community, after data access has been approved (https://www.genomicsengland.co.uk/research/pan-cancer-and-molecular-oncology-community). All analyses of Genomics England data must take place within the Genomics England Research Environment (https://re-docs.genomicsengland.co.uk/). The 100,000 Genomes Project publication policies can be found at https://files.genomicsengland.co.uk/images/Publication-Policy-v5.0.pdf. Please see https://re-docs.genomicsengland.co.uk/data_overview/ for further information. The TCGA TGCT genomic dataset reanalysed here is available from cBioPortal via https://www.cbioportal.org/study/summary?id=tgct_tcga_pan_can_atlas_2018 (ref. ⁵). A SFTP server is available to access the PCAWG data referenced in our study. More information on accessing this data can be found at https://docs.icgc-argo.org/docs/data-access/icgc-25k-data#accessing-icgc-25k-release-data.

Code availability

The code for the WGS subclonal copy number caller can be found at https://github.com/Wedge-lab/battenberg (v.2.2.7). The code for the DPClust R package used for tumour subclonal reconstruction can be found at https://github.com/Wedge-lab/dpclust (v.2.2.5). The code for inferring the order of genomic events can be found at https://github.com/hturner/PlackettLuce. The code for the chronological timing analysis can be found at https://gerstung-lab.github.io/PCAWG-11/ and https://github.com/gerstung-lab/MutationTimeR. The code used for filtering CNAs can be found at https://github.com/afrangou/CleanCNA. The code used to identify rearrangement hotspots can be found at https://github.com/DominikGlodzik/hotspots/tree/glodzik2016/. The code used to time amplifications relative to WGD can be found at https://github.com/Wedge-lab/AmplificationTimeR.

References

Cancer Research UK. Testicular Cancer Statistics. https://www.cancerresearchuk.org/health-professional/cancer-statistics/statistics-by-cancer-type/testicular-cancer (2022).
Moch, H. et al. The 2022 World Health Organization classification of tumours of the urinary system and male genital organs—part a: renal, penile, and testicular tumours. Eur. Urol. 82, 458–468 (2022).
Article PubMed Google Scholar
Oosterhuis, J. W. & Looijenga, L. H. J. Human germ cell tumours from a developmental perspective. Nat. Rev. Cancer 19, 522–537 (2019).
Article PubMed CAS Google Scholar
Litchfield, K. et al. Whole-exome sequencing reveals the mutational spectrum of testicular germ cell tumours. Nat. Commun. 6, 5973 (2015).
Article ADS PubMed PubMed Central CAS Google Scholar
Shen, H. et al. Integrated molecular characterization of testicular germ cell tumors. Cell Rep. 23, 3392–3406 (2018).
Article PubMed PubMed Central CAS Google Scholar
Cheng, M. L. et al. Germ cell tumor molecular heterogeneity revealed through analysis of primary and metastasis pairs. JCO Precis Oncol. 4, 1307–1320 (2020).
Article ADS Google Scholar
Taylor-Weiner, A. et al. Genomic evolution and chemoresistance in germ-cell tumours. Nature 540, 114–118 (2016).
Article ADS PubMed PubMed Central CAS Google Scholar
Loveday, C. et al. Genomic landscape of platinum resistant and sensitive testicular cancers. Nat. Commun. 11, 1–12 (2020).
Article Google Scholar
Xu, L. et al. Integrated genomic analysis reveals aberrations in WNT signaling in germ cell tumors of childhood and adolescence. Nat. Commun. 14, 2636 (2023).
Article ADS PubMed PubMed Central CAS Google Scholar
Dorssers, L. C. J. et al. Molecular heterogeneity and early metastatic clone selection in testicular germ cell cancer development. Br. J. Cancer 120, 444–452 (2019).
Article PubMed PubMed Central Google Scholar
Oliver, T. R. W. et al. Clonal diversification and histogenesis of malignant germ cell tumours. Nat. Commun. 13, 4272 (2022).
Article ADS PubMed PubMed Central CAS Google Scholar
Turnbull, C. Introducing whole-genome sequencing into routine cancer care: the genomics england 100 000 genomes project. Ann. Oncol. 29, 784–787 (2018).
Article PubMed CAS Google Scholar
Turro, E. et al. Whole-genome sequencing of patients with rare diseases in a national health system. Nature 583, 96–102 (2020).
Article ADS PubMed PubMed Central CAS Google Scholar
Willmore-Payne, C., Holden, J. A., Chadwick, B. E. & Layfield, L. J. Detection of c-kit exons 11- and 17-activating mutations in testicular seminomas by high-resolution melting amplicon analysis. Mod. Pathol. 19, 1164–1169 (2006).
Article PubMed CAS Google Scholar
Wang, L. et al. Novel somatic and germline mutations in intracranial germ cell tumours. Nature 511, 241–245 (2014).
Article ADS PubMed PubMed Central CAS Google Scholar
Mularoni, L., Sabarinathan, R., Deu-Pons, J., Gonzalez-Perez, A. & López-Bigas, N. OncodriveFML: A general framework to identify coding and non-coding regions with cancer driver mutations. Genome Biol. 17, 128 (2016).
Article PubMed PubMed Central Google Scholar
Arnedo-Pac, C., Mularoni, L., Muiños, F., Gonzalez-Perez, A. & Lopez-Bigas, N. OncodriveCLUSTL: A sequence-based clustering method to identify cancer drivers. Bioinformatics 35, 4788–4790 (2019).
Article PubMed PubMed Central CAS Google Scholar
Zhu, H. et al. Candidate cancer driver mutations in distal regulatory elements and long-range chromatin interaction networks. Mol. Cell 77, 1307–1321.e10 (2020).
Article PubMed CAS Google Scholar
Nguyen, B. et al. Genomic characterization of metastatic patterns from prospective clinical sequencing of 25,000 patients. Cell 185, 563–575.e11 (2022).
Article PubMed PubMed Central CAS Google Scholar
Brabrand, S. et al. Exome sequencing of bilateral testicular germ cell tumors suggests independent development lineages. Neoplasia 17, 167–174 (2015).
Article PubMed PubMed Central CAS Google Scholar
Venditti, M., Arcaniolo, D., De Sio, M. & Minucci, S. First evidence of the expression and localization of prothymosin α in human testis and its involvement in testicular cancers. Biomolecules 12, 1210 (2022).
Article PubMed PubMed Central CAS Google Scholar
Chakravarty, D. et al. OncoKB: A precision oncology knowledge base. JCO Precis. Oncol. 2017, PO.17.00011 (2017).
Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).
Article PubMed PubMed Central CAS Google Scholar
Mermel, C. H. et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 12, R41 (2011).
Article PubMed PubMed Central Google Scholar
Pluta, J. et al. Identification of 22 susceptibility loci associated with testicular germ cell tumors. Nat. Commun. 12, 1–13 (2021).
Article Google Scholar
Nazeer, T. et al. Histologically pure seminoma with elevated alpha-fetoprotein: a clinicopathologic study of ten cases. Oncol. Rep. 5, 1425–1429 (1998).
PubMed CAS Google Scholar
Yuasa, T. et al. Detection of alpha-fetoprotein mRNA in seminoma. J. Androl. 20, 336–340 (1999).
Article PubMed CAS Google Scholar
Looijenga, L. H. J. et al. Role of gain of 12p in germ cell tumour development. APMIS 111, 161–171 (2003). discussion 172–3.
Article PubMed Google Scholar
Geurts van Kessel, A. et al. Chromosome 12q heterozygosity is retained in i(12p)-positive testicular germ cell tumor cells. Cancer Genet. Cytogenet. 40, 129–134 (1989).
Article PubMed CAS Google Scholar
Glodzik, D. et al. A somatic-mutational process recurrently duplicates germline susceptibility loci and tissue-specific super-enhancers in breast cancers. Nat. Genet. 49, 341–348 (2017).
Article PubMed PubMed Central CAS Google Scholar
Berry, L. W., Westlund, B. & Schedl, T. Germ-line tumor formation caused by activation of glp-1, a Caenorhabditis elegans member of the Notch family of receptors. Development 124, 925–936 (1997).
Article PubMed CAS Google Scholar
Kim, H. et al. Extrachromosomal DNA is associated with oncogene amplification and poor outcome across multiple cancers. Nat. Genet. 52, 891–897 (2020).
Article PubMed PubMed Central CAS Google Scholar
Deshpande, V. et al. Exploring the landscape of focal amplifications in cancer using AmpliconArchitect. Nat. Commun. 10, 392 (2019).
Article ADS PubMed PubMed Central CAS Google Scholar
Zakov, S., Kinsella, M. & Bafna, V. An algorithmic approach for breakage-fusion-bridge detection in tumor genomes. Proc. Natl Acad. Sci. USA. 110, 5546–5551 (2013).
Article ADS MathSciNet PubMed PubMed Central CAS Google Scholar
Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).
Article ADS PubMed PubMed Central CAS Google Scholar
Coorens, T. H. H. et al. Inherent mosaicism and extensive mutation of human placentas. Nature 592, 80–85 (2021).
Article ADS PubMed PubMed Central CAS Google Scholar
Inman, G. J. et al. The genomic landscape of cutaneous SCC reveals drivers and a novel azathioprine associated mutational signature. Nat. Commun. 9, 3667 (2018).
Article ADS PubMed PubMed Central Google Scholar
Gunnarsson, R. et al. Single base substitution and insertion/deletion mutational signatures in adult core binding factor acute myeloid leukemia. Leukemia 36, 1681–1684 (2022).
Article PubMed PubMed Central CAS Google Scholar
Steele, C. D. et al. Signatures of copy number alterations in human cancer. Nature 606, 984–991 (2022).
Article ADS PubMed PubMed Central CAS Google Scholar
Degasperi, A. et al. A practical framework and online tool for mutational signature analyses show inter-tissue variation and driver dependencies. Nat. Cancer 1, 249–263 (2020).
Article PubMed PubMed Central CAS Google Scholar
Gerstung, M. et al. The evolutionary history of 2,658 cancers. Nature 578, 122–128 (2020).
Article ADS PubMed PubMed Central CAS Google Scholar
Sohni, A. et al. The neonatal and adult human testis defined at the single-cell level. Cell Rep. 26, 1501–1517.e4 (2019).
Article PubMed PubMed Central Google Scholar
Rahbari, R. et al. Timing, rates and spectra of human germline mutation. Nat. Genet. 48, 126–133 (2016).
Article PubMed CAS Google Scholar
Jakobsdottir, G. M., Dentro, S. C., Bristow, R. G. & Wedge, D. C. AmplificationTimeR: an R package for timing sequential amplification events. Bioinformatics 40, btae281 (2024).
Bolli, N. et al. Heterogeneity of genomic evolution and mutational profiles in multiple myeloma. Nat. Commun. 5, 2997 (2014).
Article ADS PubMed Google Scholar
McGranahan, N. et al. Allele-specific HLA loss and immune escape in lung cancer evolution. Cell 171, 1259–1271.e11 (2017).
Article PubMed PubMed Central CAS Google Scholar
Newton, S. C., Blaschuk, O. W. & Millette, C. F. N-cadherin mediates Sertoli cell-spermatogenic cell adhesion. Dev. Dyn. 197, 1–13 (1993).
Article PubMed CAS Google Scholar
Giannoulatou, E. et al. Whole-genome sequencing of spermatocytic tumors provides insights into the mutational processes operating in the male germline. PLoS One 12, e0178169 (2017).
Article PubMed PubMed Central Google Scholar
Moore, L. et al. The mutational landscape of human somatic and germline cells. Nature 597, 381–386 (2021).
Article ADS PubMed CAS Google Scholar
Strohsnitter, W. C. et al. Cancer risk in men exposed in Utero to Diethylstilbestrol. J. Natl Cancer Inst. 93, 545–551 (2001).
Article PubMed CAS Google Scholar
Shetty, G., Comish, P. B., Weng, C. C. Y., Matin, A. & Meistrich, M. L. Fetal radiation exposure induces testicular cancer in genetically susceptible mice. PLoS One 7, e32064 (2012).
Article ADS PubMed PubMed Central CAS Google Scholar
Funt, S. A. et al. Impact of teratoma on the cumulative incidence of disease-related death in patients with advanced germ cell tumors. J. Clin. Oncol. 37, 2329–2337 (2019).
Article PubMed PubMed Central CAS Google Scholar
Brandsma, A. M. et al. Mutation signatures of pediatric acute myeloid leukemia and normal blood progenitors associated with differential patient outcomes. Blood Cancer Discov. 2, 484–499 (2021).
Article PubMed PubMed Central CAS Google Scholar
Linder, N. et al. Deep learning for detecting tumour-infiltrating lymphocytes in testicular germ cell tumours. J. Clin. Pathol. 72, 157–164 (2019).
Article PubMed Google Scholar
Montesion, M. et al. Somatic HLA class I loss is a widespread mechanism of immune evasion which refines the use of tumor mutational burden as a biomarker of checkpoint inhibitor response. Cancer Discov. 11, 282–292 (2021).
Article PubMed CAS Google Scholar
Oosterhuis, J. W. & Looijenga, L. H. J. in Encyclopedia of Cancer 3rd ed. (eds. Boffetta, P. & Hainaut, P.) 121–153 (Elsevier Inc., Amsterdam, The Netherlands, 2019).
Tian, Q., Frierson, H. F. Jr, Krystal, G. W. & Moskaluk, C. A. Activating c-kit gene mutations in human germ cell tumors. Am. J. Pathol. 154, 1643–1647 (1999).
Article PubMed PubMed Central CAS Google Scholar
Kim, S. et al. Strelka2: fast and accurate calling of germline and somatic variants. Nat. Methods 15, 591–594 (2018).
Article PubMed CAS Google Scholar
Sosinsky, A. et al. Insights for precision oncology from the integration of genomic and clinical data of 13,880 tumors from the 100,000 genomes cancer programme. Nat. Med. 30, 279–289 (2024).
Article PubMed PubMed Central CAS Google Scholar
Tate, J. G. et al. COSMIC: The catalogue of somatic mutations in cancer. Nucleic Acids Res. 47, D941–D947 (2019).
Article PubMed CAS Google Scholar
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
Article PubMed PubMed Central CAS Google Scholar
Landrum, M. J. et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 42, D980–D985 (2014).
Article PubMed CAS Google Scholar
Martínez-Jiménez, F. et al. A compendium of mutational cancer driver genes. Nat. Rev. Cancer 20, 555–572 (2020).
Article PubMed Google Scholar
Dentro, S. C. et al. Characterizing genetic intra-tumor heterogeneity across 2,658 human cancer genomes. Cell 184, 2239–2254.e39 (2021).
Article PubMed PubMed Central CAS Google Scholar
Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339 (2012).
Article PubMed PubMed Central CAS Google Scholar
Layer, R. M., Chiang, C., Quinlan, A. R. & Hall, I. M. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 15, R84 (2014).
Article PubMed PubMed Central Google Scholar
Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220–1222 (2016).
Article PubMed CAS Google Scholar
Li, Y. et al. Patterns of somatic structural variation in human cancer genomes. Nature 578, 112–121 (2020).
Article ADS PubMed PubMed Central CAS Google Scholar
Islam, A. et al. Uncovering novel mutational signatures by de novo extraction with SigProfilerExtractor. bioRxiv https://doi.org/10.1101/2020.12.13.422570 (2021).
Shukla, S. A. et al. Comprehensive analysis of cancer-associated somatic mutations in class I HLA genes. Nat. Biotechnol. 33, 1152–1158 (2015).
Article PubMed PubMed Central CAS Google Scholar
Caulfield, M. et al. National Genomic Research Library [Dataset]. figshare https://doi.org/10.6084/m9.figshare.4530893.v7 (2017).

Download references

Acknowledgements

We thank the participants for contributing to this study. This research was made possible through access to the data and findings generated by the 100,000 Genomes Project. The 100,000 Genomes Project is managed by Genomics England Limited (GEL; a wholly owned company of the Department of Health and Social Care). The 100,000 Genomes Project is funded by the National Institute for Health Research (NIHR) and NHS England. The Wellcome Trust, Cancer Research UK, and the Medical Research Council have also funded research infrastructure. The 100,000 Genomes Project uses data provided by participants and collected by the National Health Service as part of their care and support. This work was co-funded by the NIHR Manchester Biomedical Research Centre (NIHR203308). C.V. is partly funded by the NIHR Oxford Biomedical Research Centre (BRC). The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR, or the Department of Health and Social Care. A.J.C., D.C., B.K., and R.S.H. were supported by funding from the Wellcome Trust (214388) and Cancer Research UK (C1298/A8362). The authors would like to acknowledge Dr Sam Behjati and Thomas Oliver of the Wellcome Sanger Institute for interesting discussions and Peter O’Donovan for technical support.

Author information

A full list of members and their affiliations appears in the Supplementary Information.
These authors contributed equally: Anna Frangou, Ben Kinnersley.
These authors jointly supervised this work: Andrew Protheroe, Matthew J. Murray, David C. Wedge, Clare Verrill.

Authors and Affiliations

Big Data Institute, Nuffield Department of Medicine, University of Oxford, Oxford, UK
Máire Ní Leathlobhair, Iliana Peneva, Atef Sahli & David C. Wedge
Ludwig Institute for Cancer Research, Nuffield Department of Medicine, University of Oxford, Oxford, UK
Máire Ní Leathlobhair
Department of Microbiology, Moyne Institute of Preventive Medicine, School of Genetics and Microbiology, Trinity College Dublin, Dublin, Ireland
Máire Ní Leathlobhair & Evie M. Smyth
Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
Anna Frangou
Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
Ben Kinnersley, Alex J. Cornish, Daniel Chubb, Philip Law & Richard S. Houlston
University College London Cancer Institute, 72 Huntley Street, London, UK
Ben Kinnersley
Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg, Gothenburg, Sweden
Eszter Lakatos
Genomics England, London, UK
Prabhu Arumugam, Vincent Wolverson & Alona Sosinsky
Department of Biology, University of Konstanz, Universitaetsstrasse 10, D-78464, Konstanz, Germany
Andreas J. Gruber
Manchester Cancer Research Centre, The University of Manchester, Manchester, UK
Avraam Tapinos
Division of Cancer Sciences, University of Manchester, Manchester Academic Health Science Centre, Manchester, UK
G. Maria Jakobsdottir, Atef Sahli & David C. Wedge
Christie Hospital, The Christie NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester, UK
G. Maria Jakobsdottir, Alexander T. J. Lee & David C. Wedge
Norfolk and Norwich University Hospitals NHS Foundation Trust, Norwich, UK
Richard Y. Ball
Guy’s and St Thomas’ NHS Foundation Trust, London, UK
Rushan Sylva
Department of Pathology, The Royal Wolverhampton NHS Trust, Wolverhampton, UK
Ksenija Benes
Leeds Institute of Medical Research at St James’s, University of Leeds, Leeds, UK
Dan Stark
Weston Park Cancer Centre, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
Robin J. Young
Department of Oncology, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
Andrew Protheroe
Department of Paediatric Haematology and Oncology, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
Matthew J. Murray
Department of Pathology, University of Cambridge, Cambridge, UK
Matthew J. Murray
NIHR Oxford Biomedical Research Centre, Oxford, UK
Clare Verrill
Nuffield Department of Surgical Sciences, University of Oxford, Oxford, UK
Clare Verrill

Authors

Máire Ní Leathlobhair
View author publications
Search author on:PubMed Google Scholar
Anna Frangou
View author publications
Search author on:PubMed Google Scholar
Ben Kinnersley
View author publications
Search author on:PubMed Google Scholar
Alex J. Cornish
View author publications
Search author on:PubMed Google Scholar
Daniel Chubb
View author publications
Search author on:PubMed Google Scholar
Eszter Lakatos
View author publications
Search author on:PubMed Google Scholar
Prabhu Arumugam
View author publications
Search author on:PubMed Google Scholar
Andreas J. Gruber
View author publications
Search author on:PubMed Google Scholar
Philip Law
View author publications
Search author on:PubMed Google Scholar
Avraam Tapinos
View author publications
Search author on:PubMed Google Scholar
G. Maria Jakobsdottir
View author publications
Search author on:PubMed Google Scholar
Iliana Peneva
View author publications
Search author on:PubMed Google Scholar
Atef Sahli
View author publications
Search author on:PubMed Google Scholar
Evie M. Smyth
View author publications
Search author on:PubMed Google Scholar
Richard Y. Ball
View author publications
Search author on:PubMed Google Scholar
Rushan Sylva
View author publications
Search author on:PubMed Google Scholar
Ksenija Benes
View author publications
Search author on:PubMed Google Scholar
Dan Stark
View author publications
Search author on:PubMed Google Scholar
Robin J. Young
View author publications
Search author on:PubMed Google Scholar
Alexander T. J. Lee
View author publications
Search author on:PubMed Google Scholar
Vincent Wolverson
View author publications
Search author on:PubMed Google Scholar
Richard S. Houlston
View author publications
Search author on:PubMed Google Scholar
Alona Sosinsky
View author publications
Search author on:PubMed Google Scholar
Andrew Protheroe
View author publications
Search author on:PubMed Google Scholar
Matthew J. Murray
View author publications
Search author on:PubMed Google Scholar
David C. Wedge
View author publications
Search author on:PubMed Google Scholar
Clare Verrill
View author publications
Search author on:PubMed Google Scholar

Consortia

Testicular Cancer Genomics England Clinical Interpretation Partnership Consortium

Richard Y. Ball
, Ksenija Benes
, Daniel Chubb
, Alex J. Cornish
, Anna Frangou
, Ben Kinnersley
, Alexander T. J. Lee
, Matthew J. Murray
, Máire Ní Leathlobhair
, Andrew Protheroe
, Rushan Sylva
, Dan Stark
, Clare Verrill
, David C. Wedge
& Robin J. Young

Genomics England Research Consortium

Alona Sosinsky

Contributions

C.V., M.J.M., A.P.., and D.C.W. planned the study. V.W. acted as a participant representative for the GECIP. R.Y.B., R.S., K.B., D.S., R.J.Y., A.T.J.L., coordinated participant recruitment and sampling at GMCs. M.N.L., A.F., B.K., A.J.C., D.C., A.J.G., P.A., P.L., I.P., G.M.J., A. Sahli, A. Sosinsky, A.T., E.M.S., and E.L. contributed to data processing and handling, quality control, and analysis. R.S.H. supervised parts of the analysis. M.N.L. wrote the manuscript with input from all co-authors.

Corresponding authors

Correspondence to Máire Ní Leathlobhair, Matthew J. Murray, David C. Wedge or Clare Verrill.

Ethics declarations

Competing interests

Genomics England is a company wholly owned by the UK Department of Health and Social Care and was created in 2013 to introduce WGS into healthcare in conjunction with NHS England. Authors affiliated with Genomics England are, or were, salaried by or seconded to Genomics England (A. Sosinsky, P.A.) or act as a GeCIP-specific patient/participant representative (V.W.). All other authors declare they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Peer review

Peer review information

Nature Communications thanks Toshihiro Shioda, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Description of Additional Supplementary Files

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Supplementary Data 4

Supplementary Data 5

Supplementary Data 6

Supplementary Data 7

Supplementary Data 8

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ní Leathlobhair, M., Frangou, A., Kinnersley, B. et al. Genomic landscape of adult testicular germ cell tumours in the 100,000 Genomes Project. Nat Commun 15, 9247 (2024). https://doi.org/10.1038/s41467-024-53193-6

Download citation

Received: 16 March 2023
Accepted: 03 October 2024
Published: 26 October 2024
DOI: https://doi.org/10.1038/s41467-024-53193-6