Introduction

Epithelial ovarian cancer is the fifth most common cancer affecting women, comprising nearly a quarter-million cases worldwide each year. Seventy to 80% of cases are high-grade serous ovarian cancer (HGSC) which arises from the surface of the ovary or from the distal fallopian tube1. Complete surgical tumor removal has been the only significant curative treatment for early-stage, non-metastatic ovarian cancer. However, in most cases, surgery is performed at advanced stages of cancer development. In addition to surgery and chemotherapy, which constitute the standard of care for ovarian cancer patients, targeted anticancer agents, such as poly ADP-ribose polymerase inhibitors (PARPi) and anti-angiogenic drugs, have shown promising clinical results2. Despite these advances, the prognosis for ovarian cancer remains poor due to high recurrence rates and drug resistance, highlighting the urgent need for new therapies to improve patient outcomes3. While immune checkpoint blockade (ICB) agents have significantly improved survival in other solid tumor types, monospecific ICB antibodies exhibit minimal efficacy in HGSC4.

HGSC infiltration by lymphocytes is associated with a higher survival rate5. A subset of these infiltrating lymphocytes may target tumor antigens, providing protection against the tumor. Consequently, immunotherapy approaches, such as therapeutic cancer vaccination designed to generate and/or enhance tumor-reactive T lymphocytes, represent promising therapeutic strategies. Therapeutic cancer vaccines are designed to elicit an immune response against known tumor associate or cancer germline antigens, and the full capitalization of this immunotherapy is dependent on the choice of the optimal antigen target6,7. Currently few well-established vaccine targets are undergoing clinical trial evaluation for ovarian cancer, including Mesothelin, NY-ESO-1, WT-1, and MUC-1. Recent studies have also identified novel putatively antigenic peptides derived from ovarian tumors8,9. These studies leverage the direct identification of tumor antigens through immunopeptidomics and primary patient tumor material.

In this study, we present a comprehensive approach for discovering candidate ovarian tumor antigens, with a specific focus on tumor peptides that have broad applicability to the HGSC patients. Ideal tumor antigens should exhibit high tumor-selective expression, be presented by HLA alleles prevalent in the population, and demonstrate strong immunogenicity. To capture the full spectrum of shared tumor antigen, we expanded our immunopeptidome search beyond canonical exons and employed a proteogenomic discovery strategy using a personalized reference database. Our candidate antigens selection process incorporates several factors to assess therapeutic potential, such as the dysregulation of source genes in ovarian cancer, the population frequency of target HLA alleles, and the similarity of peptides to human pathogen antigens10,11,12,13.

Results

Proteogenomic approach discovers shared tumor antigens in 11 ovarian cancer patients

To investigate the tumor antigen landscape of HGSC, we collected tumor samples from 11 patients during surgical resection. We initially evaluated the HLA class I-presented peptides using a direct immunopeptidomics pipeline. HLA-I binding peptides were enriched using state-of-the-art immunoaffinity purification methods. The eluted peptides were characterized by LC-MS/MS, and the spectra were resolved to peptide sequences using the human canonical UniProt proteome as a reference database. Additionally, to broaden our antigen discovery for HLA-I -presented peptides originating outside of the canonical proteome we performed a second search of the MS/MS spectra to a custom reference database. This custom database was constructed from patient tumor RNA sequencing reads using the StringTie assembler, representing the personalized transcriptomes of the 11 ovarian tumor patients (Fig. 1).

Fig. 1: Novel approach for tumor antigen identification from ovarian tumor tissues.
figure 1

Samples are collected, identified by pathology, and biobanked for characterization. Tumor specimens are split to undergo RNA sequencing and HLA-I immunoaffinity purification. The assembled RNA-Seq reads generated by StringTie are utilized to construct a de novo transcriptome and to identify HLA alleles. Peptides purified through immunoaffinity are analyzed via LC-MS/MS and compared against both Uniprot proteins and the custom StringTie database. Identified peptides are then ranked based on their binding potential using MHCFlurry and differential expression in ovarian cancer. Further filtering steps eliminate rare HLA types and prioritize peptides shared by multiple patients and binding to more prevalent HLAs. Finally, candidate peptides are evaluated for immunogenicity in healthy donors and PBMCs derived from ovarian cancer patients. Infographic was created using Infographic was created using BioRender.com.

Using the reference UniProt database, each immunopeptidomics run gathered on average 1633 peptides (range 503–2553). On the other hand, using this proteogenomic approach to matching peptide spectra resolved around half of the number of peptides, on average 750 peptides per run (range 337–1386) (Fig. 2A). However, we observed that around 90% of the peptides, regardless of whether they were identified using UniProt or StringTie corresponded to the interval of 8 to 12 amino acids in length (Fig. 2B) with around 40% of them corresponding to 9mers (Fig. 2C).

Fig. 2: Analysis of immunopeptidomics profiles and data quality assessment.
figure 2

A Number of eluted peptides of all lengths identified in each immunopeptidomics run, using either the UniProt (black) or StringTie (magenta) databases. Each dot represents a single immunopeptidomics run. B Percentage of total peptides with lengths between 8 and 12 amino acids. C Percentage of total peptides that are 9 amino acids long. D Distribution of peptide lengths, represented by the number of eluted peptides for each dataset. E The percentage of 9mers peptides predicted to be binders in each immunopeptidomics run. Binding peptide classification was performed using the MHCflurry tool and the corresponding set of HLA-I alleles for each sample. Peptides were classified as HLA-I ligands (binders) when their predicted “affinity percentile” was lower than 2. F Evaluation of the overall overlap between datasets of all 9mers derived by matching against UniProt (dark gray) or StringTie database (magenta). G Percentage of 9mers peptides found in the IEDB repository. H Comparison of the predicted HLA binding affinity profiles between peptides which were found in the IEDB repository or not, for both the peptide dataset obtained from UniProt or StringTie. The vertical gray dotted lines represent, from left to right, the affinity thresholds for “strong binders” (50 nM) and “weak binders” (500 nM), respectively.

Both searches methods gave the standard expected peak at 9 amino acids in length as expected from HLA-I eluted peptides (Fig. 2D). We evaluated the performance of our HLA-I peptide elution in two ways: first, by assessing the sequence motifs of the eluted peptides, and second, by assessing the fraction of peptides considered binders for the specific HLA alleles corresponding to each analyzed sample. Peptide motif deconvolution, performed using Gibbs Clustering tool14, showed that the motifs of the eluted peptides largely overlapped with the expected motifs for the HLA alleles of each subject (Supplementary Fig. 1). However, as previously noted15, this method has some limitations in specificity, which may arise from either a limited number of peptides or the high complexity of the expected haplotypes. Such complexity occurs when two or more HLA alleles in the haplotype exhibit overlapping or similar presentation motifs.

To address peptide specificity more thoroughly, we used the machine learning-based method MHCflurry16. On average, 90% of the 9mers were predicted as strong binders to their respective HLA alleles across all samples (Fig. 2E). Machine learning-based peptide-MHC specificity deconvolution showed consistent pattern of presentation among peptides identified using Uniprot or StringTie (Supplementary Fig. 2).

Additionally, we observed that over 95% of 9mers identified using StringTie were identified using the UniProt dataset, while only the 38.48% of the 9mers in the UniProt dataset were found in the search using the StringTie assembled transcripts database (Fig. 2F).

Next, we focused on comparing individual tumor samples. Overall, peptide overlap analysis revealed the formation of 4 to 5 clusters, with the highest overlap occurring—as expected—between UniProt- and StringTie-derived datasets from the same samples or biological replicates (Supplementary Fig. 3A).

Since the dataset sizes varied, with StringTie-derived peptide sets being approximately half the size of the UniProt-derived sets, we further examined directional overlaps to account for this imbalance. This analysis revealed more clearly defined clusters—specifically, four—and showed that StringTie-derived peptides were largely contained within the corresponding UniProt-derived datasets (Supplementary Fig. 3B).

To have a better understanding of the similarity among immunopeptidomic profiles of different individual regardless of the source database, we performed an overlap search separately on the samples derived by the two searches. We observed that patents’ immunopeptidomics overlapped in consistent pattern regardless of the method (Supplementary Fig. 4A, B). Based on the nature of the immunopeptidomics, we hypothesized that such overlap was a reflection of the similarity in HLA haplotypes among different subjects. Therefore, we have performed of one-hot encoding of each patient using HLA-allele name with a 2-digit specificity and we performed hierarchical clustering, and we observed that we obtained the same clusters indeed underlining that the peptides that we observed are heavily influenced by the MHC composition of each patient (Supplementary Fig. 4C, D). Therefore, the more HLAs molecules subjects have in common the more overlap we will be able to observe. This further underline the need to have a cohort which is large enough to have a sufficient coverage of frequent HLAs.

Interestingly, when considering the 9mers of each datasets, over 90% had already been previously identified as HLA-I ligands and reported in the IEDB database (Fig. 2G), further strengthening confidence of our ovarian cancer dataset. The main difference between peptides found in or absent from the IEDB repository was their differential HLA-binding affinity. Peptides absent from IEDB exhibited weaker binding affinity profiles compared to those previously annotated (Fig. 2H).

These findings collectively support the reliability of the discovered peptides in our ovarian cancer dataset.

Identification of abundant overexpressed genes in ovarian tumor

To further prioritize relevant candidate targets for the development of therapeutically effective vaccines, we sought to identify genes overexpressed in ovarian tumor tissue compared to matching healthy tissue controls. To achieve this, we conducted differential gene expression (DGE) analysis using RNAseq data from healthy ovarian tissues in the GTEx repository and ovarian tumor samples from the TCGA database. To account for potential batch effects that could render the results uninterpretable, we specifically utilized the UCSC Xena Toil Recompute results17. This dataset presents a batch normalization which brings data from both tumor and healthy subjects into a comparable numerical space. Additionally, we processed our ovarian tumor RNAseq dataset utilizing the same quantification and normalization methods. We performed principal component analysis (PCA) of RSEM-normalized transcript counts of the eleven ovarian cancer subjects reported here, the Cancer Genome Atlas HGSC (TCGA-OV) and the GTEx healthy ovarian tissue database (GTEX-OV). We observed that ovarian cancer patients involved in this study clustered with those in the TCGA-OV dataset, while remaining distinct from the GTEX-OV cohort (Fig. 3A). These results support the tumorigenic identity of the collected ovarian tumor samples and demonstrate that meaningful gene expression differences can be obtained when proper precautions are taken to avoid over-interpretation (Fig. 3A).

Fig. 3: Compositional similarity of gene expression between sequenced tumors.
figure 3

A Principal component analysis (PCA) plot shows that the gene expression patterns of the ovarian cancer patients subjects exhibit greater similarity to those of the TCGA-OV subjects than to the GTEX-OV subjects. B Scatter plot displays significantly upregulated genes in ovarian cancer patients tumor samples compared to healthy tissue (GTEX). In gray, the genes having base mean >50, s value < 0.05, Log2 Fold Change >1, while, highlighted in red are 65 genes with the following characteristics: base mean >500, s value < 0.001, Log2 Fold Change ≥4.

Next, we performed differential gene expression analysis (DGE) to explore overexpressed genes in tumor tissues compared to the heathy controls. Using Ape-GLM shrinkage, we identified 8311 differentially expressed genes in total. Of these, 4433 were found to be down-regulated in tumor and 3878 were up-regulated (Fig. 3B). To prioritize candidates for subsequent immunogenicity screening, we focused on transcripts which were both strongly over-expressed in tumor and were abundant (i.e. had high baseline gene expression). Applying very stringent criteria (s-value < 0.001, log2(shrunken FC) ≥ 4, and baseline-mean expression ≥ 500) we identified 65 genes. This set included, several known ovarian cancer tumor-associated genes such as MUC1, MUC16, MSLN, CLDN1, further validating the robustness of our approach.

Peptide selected using additional immunological criteria show promising immunogenicity profile

To further prioritize the remaining peptides for experimental immunogenicity testing and to account for multiple criteria related to therapeutic potential, we applied several complementary approaches in parallel rather than sequentially (Fig. 4A and Supplementary Figs. 5, 6), followed by manual curation. We used MHCflurry, a state-of-the-art HLA-binding affinity predictor, to evaluate the allele-specificity of each eluted peptide and prioritize strong binders over weak or non-binders (potential contaminant peptides). Second, we adopted differential expression data to prioritize peptides derived from genes overexpressed in ovarian cancer. Third, we favored peptides empirically observed in multiple subjects in this study over those that appeared to be more “private”. Fourth, we prioritized peptides with binding specificity to highly frequent HLA alleles. We fetched the alleles with a frequency higher than 1/100 of the “USA NMDP European Caucasian” population deposited at the Allele frequency net database (AFND)18. Fifth, we prioritized peptides with expression support from both StringTie and UniProt searches or from StringTie alone. Finally, we used bioinformatic approach (HEX)13 to identify tumor peptides with similarity to pathogen-derived sequences. Additionally, we disfavored peptides identified in the HLA Ligand-Atlas19, as they were considered to be presented in healthy tissues, reducing the probability of being immunogenic.

Fig. 4: Peptide immunogenicity assessment via ELISpot assay.
figure 4

A Schematic listing criteria used for peptide prioritization. B Interferon-gamma (IFN-γ) secretion by PBMCs from healthy donors (HDs) stimulated with selected peptides. Each dot represents an individual peptide (average of technical duplicates). C IFN-γ secretion by PBMCs from ovarian cancer (OVA) patients stimulated with the same peptides; each dot represents a single peptide. In graphs B and C, each dot represents a single peptide (average of technical duplicates). D Interferon gamma secretion upon activation of HD PBMCs stimulated with the selected peptides. Response is deconvoluted at peptide level and each dot represent the response of a different HD. E Interferon gamma secretion upon activation of OVA patients’ PBMCs stimulated with the selected peptides. Response is deconvoluted at peptide level and each dot represent the response of a different patient. F Percent response rate deconvoluted by peptides for both HDs (black) and OVA patients (magenta). Dotted line represents 25% response rate as reference. G Focus on assessment of cancer specific response for peptides with tumor-specific response (4, 8 and 13). Data from peptide 4 and 13 were analyzed using Mann–Whitney test, while data from peptide 8 were analyzed using Wilcoxon signed-rank test (*p value < 0.05) as all HD responses were 0. In all the graphs showing ELISpot data, the dotted line on the y axis represents a response threshold which correspond to 23.3 spots/1*106 PBMCs as from ref. 12.

Combining all the above-described methods, we identified 13 candidate peptides for further immunological characterization (Table 1).

Table 1 Table listing the peptides selected for immunogenicity assessment

Next, to assess the immunogenicity of the selected peptides and evaluate their therapeutic potential, we synthesized the peptides at high purity and tested their ability to activate T cells derived from PBMCs obtained from healthy donors or ovarian cancer patients. PBMCs from healthy donors were selected based on their HLA haplotypes, allowing us to test each peptide multiple times. Upon stimulation of PBMCs from healthy donors with the selected peptides, we observed that our selected peptides were able to produce T cell activation. Interestingly, on average, we observed that our peptides were less prone to activate PBMCs derived from male donors (Fig. 4B). Additionally, 4 out of 9 ovarian cancer patients exhibited T cell responses to one or more of the selected peptides (Fig. 4C). When analyzing the responses at the individual peptide level, we found that some peptides did not induce T cell activation in any of the subjects, while others elicited stronger responses in PBMCs from either healthy donors or patients (Fig. 4D, E). Peptides 5, 6, 7, and 10 triggered responses in more than 25% in both healthy donors’ and ovarian cancer patients PBMCs (Fig. 4F). However, only peptide 8 elicited a significantly stronger response in ovarian cancer patients compared to healthy donors, suggesting a tumor-specific activation profile (Fig. 4G).

Overall, our results demonstrate that several peptides identified through immunopeptidomics exhibit promising immunogenicity profiles, suggesting their potential for use in developing therapeutic cancer vaccines for ovarian cancer.

Discussion

Despite initial response to current therapies, most ovarian cancer patients experience recurrence and need further treatment. The development of chemotherapy-resistant disease poses a significant challenge, highlighting the necessity for new therapeutic interventions, such as immunotherapy20,21.

Indeed, ovarian cancer present variable levels of T cell infiltration which generally considered to be quite low22,23. Overall, the presence of tumor-infiltrating lymphocytes (TILs) has been associated with increased progression-free survival (PFS) and overall survival (OS)24,25,26. Ovarian cancer represents no exception, and the presence of TILs, especially CD8+ T cells, is indicative of a good prognosis in patients, regardless of tumor stage24,27. Consequently, ovarian cancer is regarded as a promising target for immunotherapeutic approaches, including cancer vaccines.

Numerous clinical trials over the past three decades have investigated peptide vaccines for ovarian cancer treatment, yet with limited efficacy28.

Several unmet challenges hinder the development of effective peptide vaccines and require careful consideration in their design. One critical issue identified in previous trials, which have potentially hindered their results, was the use of “classical” tumor antigens with poor tumor expression or limited evidence of presentation in the immunopeptidome29.

To address this, it is essential to identify peptides directly presented on HLA molecules at the surface of cancer cells30. Advanced techniques such as immunoaffinity purification of the HLA complex, followed by elution and the identification of naturally presented HLA ligands using liquid chromatography and tandem mass spectrometry, represents the current state-of-the-art method to uncover the immunopeptidome landscape of cancer cells31.

Integrating novel sequencing data with large public databases can present challenges, particularly due to batch effects that complicate the interpretation of results32. To overcome this issue, we opted to utilize the Toil dataset developed by Vivian et al., which is one of the most extensive and consistently analyzed repositories of human RNA-seq expression data17. This dataset integrates both tumor and healthy samples and normalized and brought into a comparable numerical space, ensuring more reliable and meaningful comparisons.

The low tumor mutation burden (TMB)22,23,33,34 in ovarian cancer makes it inconvenient to target tumor-specific antigens (TSAs) resulting from mutations for vaccine development. In contrast, tumor-associated antigens (TAAs), originating from aberrantly expressed proteins, are less immunogenic but more commonly shared among patients, enhancing their therapeutic potential.

To conduct our analysis, we used transcripts assembled with StringTie. While the vast majority of peptides identified using the StringTie dataset were also found in the UniProt search, only about 48% of the peptides identified in the UniProt search were present in the StringTie-derived dataset. We believe this discrepancy is primarily due to technical rather than biological reasons. Specifically, this might be due to the larger nature of the proteogenomic search space in comparison to the use of UniProt as a reference. Indeed the StringTie database is produced by a six-frame translation by PEAKS, substantially increasing the search space. As a result, the number of peptide identifications decreases, which is consistent with findings reported by Chong et al.35.

In future analyses, the search space could be refined by avoiding six-frame translation, for example, by using open reading frame (ORF) prediction tools such as TransDecoder (https://github.com/TransDecoder/TransDecoder), as implemented in other pipelines36.

In our study we identified high-frequency candidate target epitopes presented across multiple ovarian cancer specimens supporting their suitability for vaccine development. We are not disheartened to learn that many of our highly ranked peptide candidates are already identified and patented for this specific purpose. For example, we identified the “ALKARTVTF” peptide, derived from the CCR5 gene, which is overexpressed 16-fold in ovarian tumors. Additionally, this peptide was predicted to have strong binding, presentation, and was empirically observed in more than half of our subjects’ tumors. Interestingly, CCR5 has low expression level in multiple healthy tissues (Supplementary Fig. 5A, B) while having elevated expression in multiple cancer types. Indeed, in addition to ovarian cancer, CCR5 has high tumor-specific expression in breast, pancreas, renal, testis and skin cancers (Supplementary Fig. 6)37. CCR5 has already been chosen as a target in multiple clinical trials and is well known as a hallmark of cancer progression38. Indeed, it was patented for the treatment of ovarian cancer in 2017 (WO2018138257A1). However, while we observed increased T cell activation toward the CCR5 epitope in ovarian cancer patients compared to healthy donors, this difference was not statistically significant.

Another noteworthy target that supports the validity of our pipeline is peptide 9, “TYSEKTTLF,” an HLA-A24-restricted epitope derived from MUC16. MUC16 is largely undetectable in normal ovarian tissues and in many other healthy tissues (Supplementary Fig. 5A, B), yet it is highly expressed in over 80% of ovarian cancer samples (Supplementary Fig. 6)39. Despite this strong tumor-associated expression, we unexpectedly observed no difference in T cell reactivity to this epitope between ovarian cancer patients and healthy donors.

Additionally, we tested two splice-variant aberrations variants: DRYLLVSQF from DDX42 and KYLTIYLQK from TRMT10B. DDX42 has been associated with tumor progression in epithelial ovarian cancers and poor prognosis in HGSC40,41, whereas no specific link has been established between TRMT10B and ovarian cancer. DDX42 exhibited moderately consistent expression across multiple healthy tissues (Supplementary Fig. 5A, B), while TRMT10B showed low expression levels in normal tissues. Notably, both genes were expressed at higher levels in normal tissues compared to tumors (Supplementary Fig. 6), suggesting that, based on conventional gene expression profiles, they may not be optimal candidates. However, the peptides selected from these genes originate from cancer-specific aberrant splice variants, which may confer tumor specificity not apparent from gene-level expression alone.

Interestingly, overexpression of TRMT10C, a related family member, has been linked to poorer survival in gynecologic cancers42. In contrast to TRMT10B, TRMT10C shows higher expression in multiple cancer types (Supplementary Fig. 6), highlighting a divergent expression pattern within this gene family.

Ultimately, among all the tested peptides, only VVHLIKNAY from ITGB2 was able to produce significantly different activation of PBMCs from cancer patients compared to healthy donors, who showed no reactivity. Interestingly, the peptide was identified only in the search using the StringTie assembled database, and was overlooked by the search using the UniProt database highlighting the added value of our customized approach. The role of ITGB2 in ovarian cancer appears controversial. A comprehensive analysis conducted by Li et al. demonstrated a close association between ITGB2 upregulation and ovarian tumorigenesis43. Additionally, it has been observed that increased ITGB2 expression correlated with an unfavorable clinical outcome, heightened immune cell infiltration, including M2 macrophages, and moderate correlation with CD4+/CD8+ T cells and B cells44. On the other hand, expression profiling revealed that ITGB2 is also moderately elevated in healthy lung tissue and highly expressed in the spleen (Supplementary Fig. 5A, B). Although ITGB2 is overexpressed in several tumor types (Supplementary Fig. 6), this signal is largely overshadowed by its high expression in healthy blood. While this didn’t seem to hinder its immunogenicity in ovarian cancer patients, this pattern raises concerns about potential toxicity and off-target immune responses.

Further research is required to determine whether the peptide-specific immune responses elicited by our vaccine candidates translate into effective tumor cell killing. The detection of immune activation in response to tumor-associated antigens (TAAs)—which are typically considered poorly immunogenic—in a cohort of immunotherapy-naïve subjects is already a notable finding. While we have demonstrated the immunogenic potential of these peptides, we have not yet shown that the immune cells recognizing these peptides can functionally eliminate tumor cells presenting them. This represents a key limitation of our current study and a critical focus for future investigations.

One additional limitation of the vaccination approach in ovarian cancer is the immunosuppressive tumor microenvironment (TME) which hampers the efficacy of T cells in killing and controlling malignant cells45. Indeed, OC have been observed to express high level of PD-L1 and TILs in these tumors frequently exhibit elevated levels of LAG333,46. To overcome this challenge, future investigations should explore combining cancer vaccine strategies with immune checkpoint blockade therapies to counteract TME-induced immunosuppression.

Methods

Patient samples

Hospital sampling was performed from 28 ovarian cancer patients from mid-2020 until mid-2021. Samples were collected according to the declaration of Helsinki. All patients signed informed consent prior to samples’ collection. The study was approved by the Research Ethics Committee of the Northern Savo Hospital District (approval number 350/2020).

Ovarian tumor samples were dissected from primary ovary tumors (pelvic location) and metastatic nodes which were spread through the abdomen (omental and splenic locations). Fatty, necrotic, and bloody areas were removed and the remaining, cleaned tissues snap-frozen in liquid nitrogen prior to HLA-bound peptide enrichment. A second, pathologically similar portion of the same tumor specimen was stored in RNAprotect Tissue Reagent (Qiagen, # 76106) prior to RNA extraction. Although 28 subjects were enrolled in the clinical study, only eleven were given HGSC diagnoses, had sufficient metastatic omental tissues collected and could be included in further analysis (Supplementary Table 1).

Purification of HLA class-I complexes

HLA-I –peptide complexes were immunoaffinity purified from ovarian tumor samples using HLA-I antibody (anti-human HLA-A, HLA-B, HLA-C, clone W6/32, InVivoMab) via the method described by Bassani-Sternberg47 with minor modifications. Frozen tumor tissue was cut into 1 × 1–3 × 3 mm3 fragments and further mechanically dissociated using a gentleMACS™ Dissociator (without any enzymatic treatment). Dissociated tissue was lysed with 0.25% sodium deoxycholate, 0.2 mM iodoacetamide, 1 mM EDTA, 1 mM PMSF (phenylmethylsulfonyl fluoride), and 1% octyl-β-D glucopyranoside in the presence of protease inhibitors in PBS at 4 °C for 2 h. The lysate was precleared (2000 × g, 5 min at 4 °C) and cleared by centrifugation at 20,000 × g, 30 min at 4 °C prior to loading to the immunoaffinity column (AminoLink, Pierce) with covalently linked antibody. Following binding (o/n at 4 °C) the affinity column was washed using 10 column volumes of each buffer (150 mM NaCl, 20 mM Tris-HCl; 400 mM NaCl, 20 mM Tris-HCl; 150 mM NaCl, 20 mM Tris-HCl and 20 mM TrisHCl, pH 8.0) and bound complexes were eluted in 10% acetic acid.

Eluted peptide-HLA-I complexes were desalted using SepPak-C18 cartridges (Waters). The cartridge was prewashed with 80% acetonitrile in 0.1% trifluoro acetic acid (TFA) and with 0.1% TFA prior to loading of the samples. Samples were washed with 0.1% TFA and peptides were eluted and in 30% acetonitrile in 0.1% TFA prior to drying using vacuum centrifugation (Eppendorf).

LC-MS/MS analysis of HLA-I peptides and proteomics database search

LC-MS/MS and peptide identification were performed on a fee-for-service basis by the CIC bioGune in Bilbao, Spain.

Samples (200 ng) were loaded in a timsTOF Pro with PASEF (Bruker Daltonics) coupled online to either an Evosep ONE (Evosep) or a NanoElute (Bruker) liquid chromatograph. The 30 samples-per-day protocol (44-min gradients) was used with the Evosep ONE, whereas a custom 30-min gradient was used for the NanoElute runs. The runs were performed for each of the samples: (1) a preliminary load of 1/20 of the sample in order to check sample load, (2) an adjusted sample load where only z = 1 ions are analyzed, and (3) an adjusted sample load where z > 1 ions are analyzed. Protein identification and quantification was carried out using the PEAKS X software (Bioinformatics solutions). All three loads for each sample were summed and considered as a single sample. Searches were carried out against a database consisting of Homo sapiens (Uniprot/Swissprot) or a multi-sample, merged Stringtie-generated Expressed Sequence Tag (EST) ad hoc database (described more in depth later), with precursor and fragment tolerances of 20 parts per million (ppm) and 0.05 Da respectively. HLA class I-presented peptides with a mass range between 400 and 650 m/z, and with charge states <4+ were considered for further analysis in line with what has been done elsewere48,49,50.

Total RNA extraction, RNA sequencing and comparative analysis with GTEX and TCGA database

As previously mentioned, the tumor specimen’s portion to be dedicated to RNA-seq was stored in tumor specimen was stored in RNAprotect Tissue Reagent (Qiagen, # 76106) prior RNA extraction. Total RNA was extracted from ~30 mg of patient tumor tissues by RNAeasy Protect Cell Mini Kit (Qiagen, # 74624) according to the manufacturer’s instruction. RNA sequencing was performed by Eurofins Genomics to a minimum depth of 30 M reads on Illumina NovaSeq 6000 instrument with 2x150bp paired-end, strand-specific sequencing procedure. Raw data from healthy omentum, and metastatic omentum were analyzed for using UCSC Toil RSEM gene expression pipeline (hg38, gencode v23)17, and merged with the GTEx (Genotype-Tissue Expression), TARGET, and TCGA (The Cancer Genome Atlas) Toil Recompute archive [github, link]. This particular pipeline and reference pairing was selected in order to maintain bioinformatic methods comparability between the new clinical samples and the well-known healthy-tissue efforts available in public databases.

RNA-sequencing of ovarian tumor tissues and AS identification

Each patient’s transcriptome was processed separately, and subsequently merged into a “pan-transcriptome” using Stringtie in “—merge” mode. The pan-transcriptome’s splice junctions were annotated by merging with their overlapping canonical isoforms (Gencode v23) using the “gffread” utility, and repeats were masked using “gffcompare”. For compatibility with the MS spectral identification program (PEAKS X), as reported in the software's user manual (Section 6.1), splice junctions were converted into mature transcripts in EST format, using the gffread, ‘-w’ flag from the hg38 reference genome coordinates.

To maintain congruent comparisons, Stringtie was not run using a HISAT2 re-alignment (per best practices), but rather run using the same aligned reads matrix from above (“Total RNA extraction, RNA sequencing and comparative analysis with GTEX and TCGA database”).

HLA-typing

HLA typing was inferred from RNA-Seq data using the ArcasHLA pipeline51. The ArcasHLA suite contains an explicit nucleotide sequence database which was constructed from the implicit HLA allele differences as described by The International ImMunoGeneTics database (IMGT) consortium. ArcasHLA performs typing inference by extracting aligned reads from chromosome 6 and then uses the Kallisto pseudoaligner to rapidly bin reads against each reference sequence. The sequence, or pair of sequences which matches the most reads is inferred to be the correct set.

A handful of subjects had multiple sequenced biological specimens. Specimens were typed independently and the few conflicts that were observed were resolved in favor of the type for which binding affinity data was available in MHCFlurry.

Because our peptide immunoaffinity methods only targeted class-I HLA molecules, only class I matches were retained.

European ancestry was selected as a prior expectation for the purposes of identifying rare alleles.

To further maintain congruent comparisons, ArcasHLA was run on the same aligned reads matrix from above (“Total RNA extraction, RNA sequencing and comparative analysis with GTEX and TCGA database”). The table of patients’ HLA-typing is showed in Supplementary Table 2.

RNA-seq differential expression

RSEM values were imported from either the Xena precalculated RSEM isoform values, or from the Toil pipeline “rsem_isoforms.results” files. Transcript-specific counts were imported by the “tximport” function in “scaledTPM” mode. Counts were normalized by DESeq2 for differences in sequencing effort. PCA biplots were created from the variance-stabilized values. Differential expression was calculated using DESeq2 between the 11 new tumor samples and the GTEX healthy ovarian tissue.

Shrunken Log2-fold changes were calculated using the “apeglm” model using an expected “lfcThreshold” value of 1. Genes were considered differentially expressed if their shrunken log-fold change was greater than 1, their base-mean values were greater than 40.

Peptide overlap calculation

Pairwise overall overlap was recursively calculated as the number of peptides common to both sets (i.e., the intersection) divided by the total number of unique peptides identified in either set (i.e., the union).

On the other hand, left overlap was recursively calculated as the number of peptides common to both sets divided by the total number of peptides in one of the two sets being compared.

Class I MHC binding affinity prediction

Peptides derived from MS were filtered for those which were 8–15 amino acid residues in length. Each peptide MHC specificity was annotated using a predicted class I MHC binding affinity (nM), presentation probability score, and processing probability score as calculated by MHCflurry16.

HEX ranking

Putative neo-antigens between 8 and 12 amino acid residues in length were ranked using the HEX pipeline. HEX is a proprietary solution owned by Helsinki University which depends on a neural network to rank binding affinity (NetMHCpan 4.1b52), and uses a proprietary custom position-specific weight-matrix (PWM) to assess a peptide’s similarity to pathogens which humans have immuno-historically encountered.

The binding domain of the HLA molecule is approximately 9 residues in width, and therefore we chose to assess viral similarity with strong preference towards the central portion. This algorithmic weighting is achieved using HEX’s custom PWM.

HEX also includes a preformatted pathogen database which provides the “hit” sequences against which putative neoantigens are compared. HEX uses the NCBI BLASTp to perform this search. Although the HEX database can be easily substituted to support other pathogens, this work uses only the 17705-record Uniprot “Viruses [10239]” database by default. Candidate peptides are matched with potential viral orthologs by BLASTp (BLOSUM62)53, and then further refined by pairwise positional-weighted alignment using PMBEC54. Peptides with any viral ortholog then have their binding affinity predicted by NetMHCpan for each supplied HLA allele. Peptide specificity was considered as the highest binding affinity for a given MHC. Finally, an aggregate score is calculated to rank the predicted probability of a peptide for eliciting an immune response in the originating subject. When comparing tumor- and virus-derived pair of peptides the average of the binding affinity was computed for display, and pair of peptides were considered good candidates only when IC50 for both tumor and viral peptides was below 500 nM. The specific mathematics of the HEX viral similarity assessment are detailed briefly as follows. Similarity to viral sequences was further refined by pairwise positional-weighted alignment using PMBEC (link below) substitution matrix favoring the similarity in the central portion of the peptides as follows.

$$final\,similarity\,score=\mathrm{Score\; of\; the\; central\; portion}+\mathrm{Score\; of\; the\; side\; portion}$$
(1)

Where :

$${\rm{Score\; of\; the\; central\; portion}}=\mathop{\sum }\limits_{i=3\to \left(\varOmega -1\right)}^{1}\left({x}_{i}\times 2\right)$$
(2)
$${\rm{Score\; of\; the\; side\; portion}}=0.5({x}_{i}+{x}_{2}+{x}_{\varOmega })$$
(3)

Final similarity score was normalized according to the following formula:

$$normalized\,similarity\,score=\frac{a}{\sqrt{b+c}}$$
(4)

Where “a” is the similarity score calculated between each tumor and its cognate viral peptide, “b” and “c” are the similarity scores calculated by self-aligning the tumor peptide and the viral peptide respectively.

Epitope selection for immunological analysis

Identified peptides eluted from ovarian cancer specimens were searched into the HLA Ligand Atlas database19 which is the gold standard repository for ligand peptides commonly expressed in healthy tissues. Peptides identified in the HLA Ligand Atlas were considered to have lower chances to be immunogenic as they were presented in healthy tissues and probably covered by central tolerance.

Additionally, peptides were searched within the IEDB T Cell database55. Presence of a given peptide in this database was only used to gather information about its prior investigations. The database can be obtained at the following link.

Identified peptides MHC allele-specificity information were further annotated with the seroprevalence data available for the US and European population using information provided by the Allele Frequency Net Database (AFND) available at allelefrequencies.net18. Annotations were made using the “USA NMDP European Caucasian” (n = 1,242,890) dataset as retrieved in December 2020. Before adding the information to the peptide results, HLA types were filtered to only those with a population frequency greater than 1 in 100. Peptides with higher allele seroprevalence were favored over others specific to more rare alleles. When peptides presented multiple specificities, cumulative sum of the specific alleles’ frequencies were used.

PBMCs purification

Ten to thirty mL of blood was collected from patients immediately prior to the surgical tumor resection in heparin salts coated tubes. PBMC were purified using Leucosep™ tubes according to manufacturer’s instruction. PBMCs were then counted and cryopreserved in Human AB serum (Sigma) 10% DMSO (Sigma).

Healthy donors PBMCs

Healthy donors PBMCs were purchased from CTL (Bonn, Germany). The subjects were selected based on their HLA types to optimize the peptide immunogenicity testing.

Peptide synthesis

Selected peptides to be tested for immunogenicity (listed in Table 1) were synthetized by Genscript (Netherlands) at 4 mg, 90% purity.

IFN-γ ELISpot assay

T cell epitope-specific activation was detected using commercially available IFN-γ ELISpot reagents (ImmunoSpot, Bonn, Germany), accordingly to the manufacturer’s instructions. According to their HLA haplotype, each subject was tested only for the peptides that had corresponding HLA specificity. Each subject’s PBMCs were seeded at the maximum density possible (depending on the number of stimuli) but always below 600 K cells/well. Seeded PBMCs were stimulated in vitro with 2 ug/well of each peptide at 37 °C for 72 h. After 3 days of stimulation, the number of cytokine-producing, antigen-specific T cells was evaluated using an ELISpot reader system (ImmunoSpot) and all the spots’ counts were normalized for 1 × 106 seeded PBMCs. A response was deemed positive if the IFNγ ELISpot count exceeded a minimum threshold of 23.3 spots per 1*106 PBMCs as shown elsewhere12.