Background

Uterine leiomyomas (ULs) are benign neoplasms of the myometrium. Up to 70% of women develop ULs by the age of 50, and approximately one in four develop symptoms, including abnormal uterine bleeding, pelvic pain and infertility [1, 2]. Currently, the only curative treatment options are invasive, and one third or even half of all hysterectomies are conducted for UL patients [3]. Due to the high prevalence and lack of non-invasive treatments, UL related costs have been estimated to exceed the combined cost of colorectal and breast cancer in the United States [4].

Multiple molecular subtypes of ULs with typically mutually exclusive genetic drivers have been discovered: hotspot mutations of MED12, overexpression of HMGA2, biallelic inactivation of FH, SRCAP complex genes involved in histone H2A.Z loading and a group of genes affecting neddylation of Cullin 3-RING E3 ligase [5,6,7,8,9,10]. In addition, HMGA1 and PLAG1 overexpression sometimes due to chromosomal translocation and COL4A5-COL4A6 deletions through IRS4 overexpression drive UL tumourigenesis [11, 12]. The genetic driver alteration remains undiscovered for about 10% of ULs. Possible secondary genetic driver alterations are not well understood.

Fibroblast growth factor (FGF)/FGF receptor (FGFR) signalling plays a crucial role in the regulation of cell differentiation, proliferation and apoptosis in multiple cell types both in embryonic development and in adults [13]. FGFRs are transmembrane tyrosine kinase receptors that mediate the FGF signalling cues to the cytosol, activating RAS/MAPK, PI3K/AKT and DAG/PKC signalling pathways [13].

In humans FGFRs are encoded by four genes, FGFR1-4. Activating mutations of FGFR genes, including point mutations, chromosomal rearrangements, fusion genes and copy number amplifications are seen in 5–10% of all human cancers [14]. FGFR1 mutations tend to aggregate codons encoding two hotspot amino acids, asparagine 546 and lysine 656, in the tyrosine kinase domain [15]. FGFR2 mutations, that are common in endometrial, breast and lung cancer, are seen also in transmembrane and extracellular domains [15, 16]. FGFR1 amplifications are very common in many cancer types, including lung and breast cancer, whereas FGFR2-4 amplifications are less frequent, yet occurring e.g. in gastric and breast cancer [14, 16]. FGFR1-3 fusion genes have been reported with multiple partner genes in many malignancies, including glioblastoma, breast and lung cancer [16].

Previous studies have suggested that myometrial and endometrial FGFR1 expression levels vary in different stages of menstrual cycle but this fluctuation is disturbed in ULs and endometrium of UL patients, suggesting that FGFR1 might play a role in the regulation of menstrual cycle and in the tumourigenesis of Uls [17, 18]. In addition to this, it has been shown that FGFR1 gain can lead to FGFR1 overexpression in Uls [19]. However, such changes are somewhat unspecific and activating FGFR point mutations have not to our knowledge been reported. Here, we performed analysis of multiple layers of sequencing data and SNP array data showing that activating mutations in FGFR genes do indeed occur in a subset of ULs, strongly supporting the role of FGFR signalling in the genesis of ULs. Furthermore, examination of genome-wide association studies (GWAS) revealed a possible role of FGFR4 in genetic predisposition to ULs.

Methods

Sample collection

The current Finland Myoma Study sample set comprises 2677 UL samples from 863 individuals, and the corresponding normal myometrium samples for each patient (Fig. 1a, Table S1). This sample set largely overlaps the previously published sample set [9]. UL subtypes were defined as described in Supplementary Material and Methods (Fig. S1). In addition, the Swedish sample set of 372 ULs from 147 patients was collected from hysterectomy and myomectomy patients at Danderyd Hospital. All samples were collected after a written informed consent.

Fig. 1: Summary of methods.
figure 1

a All methods used in this study to detect FGFR gene alterations in the Finland Myoma Study collection (n = 2677). b Data types utilised to screen FGFR mutations in the Finnish and Swedish UL sample sets.

DNA and RNA extractions

For the Sanger sequencing, Nanopore long-read sequencing and SNP array, DNA was extracted from the fresh frozen tissues using QIAamp FAST DNA Tissue Kit (Qiagen, Hilden, Germany) or FastDNA Kit (MP Biomedicals LLC, Santa Ana, California, USA). For the previously published whole-exome sequencing (WES) and whole-genome sequencing (WGS) data, samples were processed as previously described [5, 20, 21]. For RNA-extractions, we used TRIzol Reagent (Invitrogen, Waltham, Massachusetts, USA), RNase-free Dnase (Qiagen) and RNeasy MiniElute clean-up kit (Qiagen). RNA was converted to complementary DNA (cDNA) by M-MLV Reverse Transcriptase (Promega, Madison, Wisconsin, USA) following standard reverse transcription procedures [9].

RNA-sequencing

RNA-sequencing (RNA-seq) data were generated from altogether 410 ULs and 151 corresponding normal myometrium samples from 324 patients, in part overlapping with a previously published RNA-seq dataset [9]. To this dataset, representative samples of MED12, FH and HMGA2 subtypes were selected but ULs without driver mutations identified were the primary focus. The RNA libraries were prepared using Illumina TrueSeq Stranded Total RNA with Ribo-Zero human kit (Illumina, San Diego, California, US) following the manufacturer’s instructions. RNA-seq was performed with Illumina HiSeq 2500 in SciLife for 124 samples and with NovaSeq 6000 in Macrogen for 437 samples [9].

We used Trim Galore v.0.5.0 for the quality and adaptor trimming. The reads were aligned by Salmon v1.8.0 to the default salmon transcriptome reference (accessed on May 23, 2022) using the partial selective alignment method. DESeq2 v1.34.0 and limma v3.50.0 were used to produce variance stabilised, batch-corrected gene expression values [22, 23]. Unsupervised hierarchical clustering was done utilising Ward linkage and 1-correlation distance across n = 1355 (5%) most variable genes. The analysis of differentially expressed genes between sample groups was done using DESeq2 v1.34.0, and log-fold-change shrinkage was done using the option type = ‘apeglm’ [22]. Only genes with the coverage of at least 10 reads in at least five samples were analysed (28,199 genes), and the sequencing batch was given as a confounder. For the multiple test correction, DESeq2 uses Benjamini-Hochberg method to produce adjusted p values (padj).

For the mutation calling, STAR v2.7.9a 2-PASS alignment was performed to align the reads to the reference genome GRCh38 [24]. To prepare the data for mutation calling, Genome Analysis Toolkit (GATK) v4.2.4.1 tools MarkDuplicates, SplitNCigarReads, BaseRecalibrator and ApplyBQSR were used [25]. The 1000 Genomes Project, gnomAD, and Mills SNVs and indels, and GATK’s indels were given as known variable sites for the GATK BaseRecalibrator. We used GATK Mutect2 for the variant calling with the options --dont-use-soft-clipped-bases and --pcr-indel-model AGGRESSIVE. The panel of normals was created by GATK GenomicsDBImport and GATK CreateSomaticPanelOfNormals from 40 random normal myometrium samples. Mutations were filtered against the panel of normals, with the maximum median fragment length difference of 20,000 and based on strand bias, and against the Haplotypecaller variant calls of the corresponding normals. Using BasePlayer v2, we annotated variants with the MANE transcriptome [26, 27]. Variants were further filtered against gnomAD exome v2.1.1 and genome v3.0 databases utilising ANNOVAR [28, 29]. Minimum coverage of 10 and minimum allelic fraction of 1/3 were required. The exact same deletions seen in at least two UL samples were filtered out. The InterPro database was used to obtain the domain information [30].

Fusion genes were identified using the FusionCatcher (v1.33) software and default FusionCatcher database (hg38, Ensembl v102) [31]. The resulting fusion calls were filtered to a minimum x3 coverage based on the ‘spanning unique reads’ and ‘spanning pairs’ values. All in-frame fusions involving FGFR genes were then inspected for further supporting evidence in Nanopore DNA sequencing data.

FGFR1/2 mutation screening and validation

The results from RNA-seq mutation calling led us to screen for FGFR mutations in the whole Finnish sample set of 2677 ULs, utilising all the readily available sequencing data as well as Sanger sequencing of samples where no data was available. For the samples with NGS data (n = 432), mutation calls of FGFR from RNA-seq, WGS and WES data were used (Fig. 1b, Table S1). The minimum read depth of 15 at the three most common FGFR1 hotspots and the four most common FGFR2 hotspots according to the Catalogue Of Somatic Mutations In Cancer (COSMIC) was required with RNA-seq data [15]. WGS and WES data were processed as previously described [20, 21]. Germline variants were filtered out against gnomAD exome v2.1.1 and genome v3.0 databases utilising ANNOVAR [28, 29]. Liftover from GRCh37 to GRCh38 was performed for the variant calls of WES and CompleteGenomics WGS data using GATK LiftoverVcf [25]. The support of at least three reads was required, and WES and Illumina WGS variants were annotated with the MANE transcriptome using BasePlayer v2 [26, 27].

Sanger sequencing covering areas encoding the known cancer hotspot codons 546 encoding asparagine in FGFR1 and 659 encoding lysine in FGFR2 was performed in samples with no NGS sequencing data available (n = 2245), and for the Swedish UL sample set (n = 372). For the Finnish sample set, the areas covering the known cancer hotspot codons 549 encoding asparagine in FGFR2 and codon 656 encoding lysine in FGFR1 were also screened (see Fig. 2). Variants detected in NGS data were validated by Sanger sequencing. The PCR amplification and Sanger sequencing were performed as previously described [5].

Fig. 2: Single-nucleotide mutations in FGFR1/2.
figure 2

a Altogether eight single-nucleotide mutations were found in FGFR1/2 in the Finland Myoma Study Collection. b, c Distribution of single-nucleotide mutations in FGFR1/2 in the COSMIC database of cancer mutations and the Sanger validated mutations in FGFR1/2 mutation hotspots found in our UL sample set. The domain information was obtained from the InterPro database [30].

Possible enrichment of FGFR mutations to specific UL subtypes was examined with Fisher’s exact test. Only one UL per subtype per patient was included in the analysis, and ULs with more than one known subtype were excluded.

Long-read sequencing

The Nanopore long-read sequencing data from altogether 283 ULs and 106 normal myometrium samples, in part previously published [9], were generated and processed as described in Supplementary Material and Methods. Methylation values were calculated for each CpG as the proportion of methylated reads from all reads. A read coverage of 3-75 reads was required for each CpG utilised in the analysis. Average methylation at the promoter region (1000 bp upstream from transcription start site [TSS]) was defined as the average methylation of the CpGs overlapping the promoter region. A minimum of 5 measured CpGs from the promoter was required for all samples. Locations for promoter regions were downloaded from UCSC using the Table Browser tool [32]. The linear effect due to the pore version was subtracted from the average promoter methylation. In addition, per CpG methylation values were collected for each FGFR gene 15Kbp upstream and downstream from the TSS. The geom_smooth()-function from R-package ggplot2 was used for plotting the regional methylation levels [33].

Single-nucleotide variants were called using Longshot (v.0.4.3). Variants seen with significant strand-bias or with excessive alternative allele base-calling (support of four or more reads across at least 40 samples) were filtered out as technical artefacts. Bad quality regions were defined in 250 bp windows if at least 10 normal samples showed coverage <5 with mapq>10 reads. Variants were filtered against the panel of normals formed by the normal myometrium samples using bcftools v.1.9 and lifted over to GRCh38 using GATK LiftoverVcf [25, 34]. Variants were further annotated with the MANE transcriptome and Alpha Missense predictions using BasePlayer v2 and filtered against gnomAD exome v2.1.1 and v4.0 and genome v3.0 and v4.0 databases [26, 27, 35]. The minimum of four mutant allele reads was required. The mutation calls were inspected for the mutation hotspots of FGFR genes based on COSMIC database [15].

Structural variants (SV) were called using cuddlySV v3.0.0 (https://github.com/kpalin/cuddlySV) [36, 37]. At least four supporting reads were required for each SV call. SVs seen in myomas of at least two different patients within 5 bp window were excluded as probable artefacts. Only SVs with a breakpoint within 1 Mbp from FGFR genes were included. For the intrachromosomal alterations, the minimum length of 350 bp was required. The last filtering step was performed by visual inspection of each SV call by Integrative Genomics Viewer [38].

SNP array data

Altogether 2,030 ULs were genotyped with HumanOmni 2.5–8 (n = 88) or Illumina Infinium HumanCore-24 chips (n = 1,942). Allelic imbalance segments were identified as previously described [9]. Somatic copy number variation analysis was performed using ASCAT v3.1.2 [39] for Illumina Infinium HumanCore-24 samples excluding clonally related tumours as earlier described (n = 1922) [9]. Regions where the number of copies exceeded the predicted overall ploidy of the sample by over 20% were considered as a somatic gain.

Quantitative PCR

FGFR1 expression was assessed using real-time quantitative PCR (qPCR). We used TaqMan probes for FGFR1 (Hs00241111_m1 and Hs00915142_m1) and the housekeeping gene HPRT1 (Hs02800695_m1) with the 7500 Fast Real-Time PCR System (Thermo Fisher Scientific, Waltham, Massachusetts, US). The samples were analysed as triplicates. A MED12-mutated UL was used as a reference sample to compare the other samples against.

Genome-wide association study

GWAS was compiled as a meta-analysis of three cohorts: FINNGEN (https://finngen.fi/, release 10, accessed June 2024) [40], UK Biobank (UKB, Application Number 80756, accessed June 2021) and Biobank Japan (BBJ, http://jenger.riken.jp/en/, accessed on Sept 2020). The FINNGEN cohort had a total of 34,422 UL cases and 195,888 female controls. The UKB cohort of white British women comprised a total of 18,014 UL cases and 202,535 female controls. The BBJ cohort comprised a total of 5,954 UL cases and 95,010 female controls. The phenotype definition, population stratification and genotype quality-control steps were described in our previous publication [9]. All cohort-wise summary statistics were precomputed with mixed model logistic regression (SAIGE); details are available at the sources listed above. The BBJ and UKB data were lifted over to GRCh38/hg38 coordinates (Picard LiftoverVcf). An inverse-variance weighted fixed effects meta-analysis (R package ‘meta’ v4.8-4) was applied to 6.3 million SNPs that were available from all three cohorts. Expression quantitative trait loci (eQTL) were calculated as an additive effect of the GWAS risk allele using DESeq2, while adjusting for patients’ age at hysterectomy, ancestry (principal component analysis of population structure) and RNA-seq batch.

Results

Somatic point mutations in FGFR1 and FGFR2

In the mutation calling from RNA-seq data, we identified altogether nine genes that were mutated in at least four ULs with different mutations (Table S2). Five of these genes (MED12, FH, ACTL6A, DMAP1, YEATS4) were already known UL driver genes. One gene, FGFR2, showed a well-known cancer-related hotspot mutation p.Lys659Glu [15] and three other mutations. The closer investigation of FGFR1-4 genes revealed FGFR1 mutations on the hotspot 546 encoding asparagine in three ULs (Fig. 2). These results led us to screen FGFR genes in the whole UL collection, and we detected another cancer related hotspot mutation FGFR1 c.1966A>A/G p.Lys656Glu (Fig. 2, Table S3).

All FGFR mutations except for FGFR2 p.Pro363Ser were validated somatic (Fig. 2, Table S3). For the mutations close to intron-exon boundaries FGFR2 c.1288-17 T > G and p.Pro363Ser, also cDNA sequencing was performed but no splice effect was recognised.

FGFR2-ERC1 fusion gene and structural variants in FGFR genes

In the RNA-seq dataset, an in-frame fusion gene of FGFR2 was identified in one UL without any other known driver alterations (My6006m1). The first 17 exons of FGFR2 were fused with exons 5-19 of ERC1 (Fig. 3). This alteration was also confirmed by SV analysis of Nanopore long-read sequencing data. No in-frame fusion genes were found for the three other FGFR genes.

Fig. 3: An in-frame FGFR2-ERC1 fusion gene in one UL.
figure 3

The Nanopore DNA-sequencing data of UL sample My6006m1 is visualised on IGV on the two locations showing the breakpoints of the somatic translocation leading to the fusion gene. The DNA breaks of split reads shown with red display the breakpoints. In the resulting fusion gene, exons 1-17 of FGFR2 are fused with exons 5-19 with ERC1.

To detect possible FGFR-related SVs from the Nanopore data of 283 ULs, we looked for breakpoints at most 1Mbp from the FGFR genes and identified 18 FGFR1 SVs in 5 ULs, 7 FGFR2 SVs in 4 ULs, 20 FGFR3 SVs in 14 ULs (from 12 patients), and 7 FGFR4 SVs in 6 ULs (Table S4). In addition to the FGFR2-ERC1 fusion, no intragenic breakpoints were found in FGFR genes. RNA-seq data was available for 27/28 of the ULs with FGFR SVs. However, only one of these ULs, My1004m1 without a known driver alteration, showed slightly higher FGFR4 expression (Fig. S2.). This lesion had multiple SVs in chromosome 5, and the closest breakpoints from FGFR4 were 491 Kbp downstream from FGFR4.

Gains of chromosome 8 driving FGFR1 overexpression

From the 2030 ULs with SNP array data, altogether eight ULs displayed chromosomal gain overlapping FGFR1 (Fig. 4a, Table S5). RNA-seq data were available for five of these ULs, and three of them were the three most highly FGFR1 expressing samples of the whole RNA-seq dataset (Fig. 4b). FGFR1 expression levels were evaluated using qPCR for the samples with no RNA-seq data, and FGFR1 overexpression was confirmed in one additional UL sample, My5007m2 (Fig. 4c). Thus, altogether 4/8 ULs with a gain overlapping FGFR1 showed overexpression of FGFR1 (Table S5). Two of these tumours, My5007m1 and My5007m2, were clonally related. The closer look on these four ULs revealed that tumours My5007m1, My5007m2, and My6450m1 had a whole chromosome 8 gain of both homologous chromosomes. UL My6467m2 had gained almost the whole chromosome 8 except for a small part of the p-arm (Fig. S3).

Fig. 4: Chromosomal gains and expression of FGFR genes in ULs.
figure 4

a Chromosomal gains overlapping FGFR genes in ULs with SNP array data (n = 2030). b FGFR1 expression levels in different UL subtypes and normal myometrium samples. ULs with chromosomal gains overlapping FGFR1 are circled with black line. Of these, sample names are depicted for the cases that show clear overexpression. c FGFR1 qPCR results for the ULs with an amplification of FGFR1 but no RNA-seq data. My5007m1 was used as a positive control, and random MED12 and HMGA2 ULs (My6282m7, My6326m1) as negative controls. The MED12 UL was used as a reference sample for calculating the relative quantification (RQ) values. df FGFR2-FGFR4 expression levels in different UL subtypes and normal myometrium samples. None of the ULs with gained regions overlapping these genes showed clear overexpression.

Large gained regions overlapping the other three FGFR genes were also seen in many ULs: one with FGFR2, eight with FGFR3 and five with FGFR4. However, no overexpression of these genes was seen in these ULs with the available RNA-seq data (Table S5, Fig. 4d–f).

Hypomethylation at FGFR loci

We examined CpG methylation profiles 15Kbp upstream and downstream from FGFR TSSs (Fig. S4). HMGA2 ULs seemed to show slightly lower methylation levels around the second intron of FGFR2 compared to other UL subtypes and normal myometrium samples (Fig. S4). We compared the average promoter methylation values to expression levels of FGFR genes and observed a group of ULs characterised by low methylation on promoters and high expression of FGFR2 (Figs. 5a, S5). While these promoter regions belong to alternative transcript isoforms, in the canonical FGFR2 transcript the hypomethylation pattern locates to the first intron and the second exon (Fig. 5b, c). These results suggest that hypomethylation of the promoter, first intron and second exon may be related to FGFR2 overexpression in a subset of ULs. This group of ULs included mostly HMGA2/1 ULs, a few ULs without a known driver and one COL4A5/6 UL.

Fig. 5: Methylation of FGFR2 in ULs.
figure 5

a Total FGFR2 expression level plotted against FGFR2 promoter (1Kbp upstream from TSS) methylation level for each UL and myometrium sample with both RNA-seq and Nanopore data. Similar figures for each promoter region of alternative FGFR2 transcripts can be found in the Supplementary material (Fig. S5). A group of ULs, mostly HMGA2/1 and unknown subtypes show promoter hypomethylation and overexpression of FGFR2 (circled in purple). b Methylation levels per CpG island 15Kbp upstream and downstream from TSS in different UL subtypes and normal myometrium samples (only Nanopore batch 1 data). The samples circled with purple that were in batch 1 (the seven HMGA2 samples) in (a) are separated to their own subclass ‘hypomethylated_overexpressing’. c Methylation levels per CpG island 15Kbp upstream and downstream from TSS for each sample of Nanopore batch 1 data. HMGA2 ULs show clearly lower methylation of FGFR2.

PLAG1 ULs showed a very clear FGFR3 overexpression in RNA-seq compared to any other UL subtype or normal myometrium samples (log-fold-change 2.2–3.9, padj<1e-03) (Fig. 4e). In the promoter methylation analysis, the region of 1Kbp immediately upstream from FGFR3 did not show hypomethylation. However, PLAG1 ULs showed slightly lower methylation levels about 3-8Kbp upstream from the FGFR3 TSS compared to other UL subtypes (Fig. S4).

FGFRs in UL predisposition

A GWAS comprising 58,390 UL cases and 493,433 female controls was inspected for inherited UL risk near the FGFR genes. The results revealed a genome-wide significant UL association locus 80Kbp upstream from FGFR4: The SNP rs353491 (hg38.chr5:177005946.C>G) was associated with a meta-analysis P-value of 4.2e-13 and the same risk allele (G) in all three cohorts: FINNGEN (odds ratio [OR] = 1.04, P = 4.8e-5), UKB (OR = 1.07, P = 7.0e-10) and BBJ (OR = 1.04, P = 4.8e-2). This region has been implicated also in previous UL-GWAS efforts, however, the target gene has remained unclear [41, 42]. Subsequent analysis of gene expression identified FGFR4 as the most significant eQTL of the risk allele rs353491-G in both myometrium (log-fold-change = 0.32, P = 0.008) and tumour RNA-seq (log-fold-change = 0.27, P = 0.004). No genome-wide significant GWAS loci were found near FGFR1-3.

Clinical information about the FGFR mutated ULs

The majority of the ULs with coding FGFR mutations, fusion gene, or FGFR gene gain and overexpression (10/12) were classified as conventional ULs in the histopathological evaluation (Table 1). The remaining two were evaluated as cellular and hypercellular ULs and one of them also atypical. The number of ULs varied from one to five in these patients. Compared to the other patients with somatic FGFR alterations, the patient with the germline FGFR2 mutation (My6176) had the youngest diagnosis (33 y, mean in other patients 45.9) and hysterectomy age (35 y, mean in other patients 51.3). The tumour size was also the largest in this patient ( 17 cm), whereas the diameter of the other ULs with FGFR mutations ranged from 0.9 cm to 12 cm (Table 1).

Table 1 Clinical characteristics of the patients with coding FGFR mutations, fusion gene, or FGFR gene gain and overexpression.

Discussion

In this study, we have shown that FGFR genes are activated in a subset of ULs via multiple mechanisms. We encountered hotspot mutations in FGFR1/2, other coding mutations in FGFR2, copy number gains leading to overexpression of FGFR1 and an FGFR2 fusion gene. In addition to genetic alterations, our results indicate that hypomethylation of regulatory regions at FGFR2 and FGFR3 and consequent overexpression may contribute to UL genesis.

Altogether, five out of the eight FGFR1/2 point mutations detected in our UL collection, occurred in the highly specific mutation hotspots that are established drivers of tumourigenesis. All the point mutations in FGFR1 occurred in hotspot amino acids (p.Asn546Lys, p.Asn546Asp, p.Lys656Glu) that have previously been reported in many cancer types, including glioma and breast carcinoma [15]. FGFR2 p.Lys659Glu, which is homologous to aforementioned FGFR1 p.Lys656Glu, has earlier been seen in breast and endometrial carcinomas [15].

The AlphaMissense pathogenicity prediction for FGFR2 p.Met391Arg is as high as for the hotspot mutations, suggesting that it probably is pathogenic (Table S3). The predicted pathogenicity value is much lower for the only germline mutation found in this study, FGFR2 p.Pro363Ser. However, this mutation is not a known variant in the gnomAD database [28]. The youngest age at diagnosis and hysterectomy of this patient as well as the largest tumour size compared to somatically mutated FGFR cases indicate that this germline change may predispose to UL. Our GWAS results support the notion that inherited FGFR-related variants play a role in UL predisposition, as we found a genome-wide significant locus 80Kbp upstream from FGFR4 associated with the UL phenotype in the FINNGEN, BBJ and UKB cohorts. The most significant eQTL for the risk allele was FGFR4; the risk allele carriers showed increased FGFR4 expression in both myometrium (P = 0.008) and ULs (P = 0.004).

Here, we also report an in-frame FGFR2-ERC1 fusion gene in one UL. In this alteration all the functional domains of FGFR2 were preserved. The regions encoding functionally important coiled-coil domains of ERC1 were preserved for the most parts. A similar FGFR2-ERC1 fusion gene has previously been reported in lung adenocarcinoma [43]. ERC1 has also been reported as a known fusion partner gene for other transmembrane tyrosine kinase receptors, RET and ALK. ERC1-RET fusions have been reported in pancreatic ductal adenocarcinoma, papillary thyroid carcinoma and lung cancer [44,45,46]. In turn, ERC1-ALK fusions have been seen in sarcomas and non–small cell lung carcinoma [47, 48]. ERC1 encodes for ELKS/RAB6-Interacting/CAST Family Member 1, which functions as a scaffold on the active zone on presynaptic plasma membrane affecting membrane trafficking, as a regulator of focal adhesion disassembly and in the regulation of insulin secretion [49,50,51].

We found altogether four ULs, of which two clonally related, with gains of chromosome 8 associated with FGFR1 overexpression. FGFR1 gains and amplifications are the most common FGFR alterations in cancer [16]. Chromosomal rearrangements are known to activate FGFR genes in cancer [14]. In our dataset, we found one FGFR4 overexpressing UL with multiple SVs in chromosome 5 with the closest breakpoints from FGFR4 about 491Kbp upstream that may explain the slight FGFR4 overexpression.

Investigation of FGFR methylation levels suggests that FGFR2 and FGFR3 hypomethylation may lead to overexpression of these genes in a subset of ULs. FGFR2 hypomethylation was seen in the first intron of the canonical transcript that is the promoter region of a few alternative transcripts (Fig. S5). The FGFR2 overexpression was not limited to those transcripts with hypomethylated promoter regions but also transcripts with hypomethylated intron 1 were overexpressed. Indeed, it has been shown that similarly to promoter hypomethylation also the hypomethylation of intron 1 correlates with higher gene expression levels, providing further evidence on the correlation between FGFR2 intronic hypomethylation and increased gene expression [52]. FGFR3 expression was significantly higher in PLAG1 ULs compared to other UL subtypes and normal myometrium. Methylation profiles 15Kbp upstream and downstream of FGFR3 TSS suggest that hypomethylation of the region around 3-8Kbp upstream from FGFR3 may explain the FGFR3 overexpression in at least some of the PLAG1 ULs (Fig. S4).

Interestingly, the great majority (9/12) of the coding FGFR alterations (point mutations in coding regions, the FGFR2-ERC1 fusion, and FGFR1 gain with overexpression) were seen in ULs of HMGA2, HMGA1 and PLAG1 subtypes, suggesting that FGFR alterations are more typical for these subtypes. These three UL subtypes are known to be molecularly similar to each other, probably driving tumorigenesis through PLAG1 overexpression [11]. Indeed, statistical analysis revealed a significant enrichment of coding FGFR alterations in HMGA2/1 and PLAG1 ULs (OR = 10.2, P = 1.9e-04, Fisher’s exact test). This also explains why we only found one FGFR1/2 mutation in the extensive additional Sanger screening effort, as a large proportion of HMGA1/2 and PLAG1 tumours had been included in the discovery set of RNA-sequenced samples (MED12 mutations explain for the large majority of all UL cases, and thus, up to 75.3% [2,016/2,677] of ULs in our unselective Finland Myoma Study collection belong to MED12 subtype) [5, 9]. The prevalence of activating genetic FGFR alterations was 0.45% (12/2,677) in unselected ULs and the estimated prevalence in HMGA2/1 and PLAG1 ULs is 3.0% (9/302). In addition, hypomethylation with overexpression of FGFR2 was seen especially in HMGA2 ULs and of FGFR3 in PLAG1 ULs.

It is under debate whether ULs can serve as a precursor lesion for uterine leiomyosarcomas (ULMS). Although most ULMS do not share common genetic drivers with ULs, MED12 hotspot mutations, HMGA2 overexpression and FH inactivation have been shown to occur in some ULMS cases, suggesting that leiomyosarcomas may originate from Uls [53]. As FGFR activating alterations are typical in many different cancer types, the question arises if FGFR mutated ULs have a higher potential to develop malignancy. To our knowledge, FGFR hotspot mutations have not been reported in ULMS but FGFR1 duplications, FGFR3 amplifications and deletions and FGFR3/4 overexpression have been identified in these tumours [54, 55]. The genetic landscape of ULMS is not very widely studied, and more research is needed to tackle this question.

Our findings establish the role of FGFR signalling in UL genesis, as previously suggested by others [17,18,19]. Currently, all the curative treatment options for ULs are invasive, and ULs are treated as one entity although differences in drug responses have been shown between UL subtypes [3, 56]. Our results raise the question if FGFR inhibitors could be feasible for treating FGFR mutated ULs in the future, if uninvasive molecular classification of these tumours by cell-free DNA or similar approaches becomes feasible [57]. Three FGFR inhibitors are currently approved for clinical use by FDA: erdafitinib for metastatic or locally advanced urothelial carcinoma with FGFR3 alterations; pemigatinib and futibatinib for metastatic or locally advanced, unresectable and earlier treated cholangiocarcinoma with FGFR2 fusion or rearrangement, and pemigatinib also for relapsed or refractory myeloid or lymphoid neoplasms with FGFR1 rearrangements [58, 59]. FGFR inhibitors are known to have adverse effects, such as hyperphosphatemia, ocular symptoms, and alopecia, and for management of a benign disease extensive side effects cannot be accepted [60]. However, multiple novel FGFR inhibitors are being developed for various tumour types with FGFR alterations [58]. New FGFR inhibitors with milder adverse effects could potentially be applicable for FGFR driven ULs in the future.