Background & Summary

Prostate cancer is the most common malignancy of the male urogenital system and represents the most frequently diagnosed cancer among men worldwide1,2. Despite the development of novel treatments, therapeutic options remain limited, particularly for advanced prostate cancer, such as metastatic castration-resistant prostate cancer (mCRPC)3. Immunotherapy has demonstrated significant potential in treating various solid tumors and hematologic malignancies, but its benefits in mCRPC have been underwhelming4. This may be attributable to the immunologically ‘cold’ tumor microenvironment (TME) characteristic of prostate cancer. In the immune microenvironment of prostate cancer, which exhibits low levels of tumor-infiltrating lymphocytes and abundant immunosuppressive cells5. Converting these immunologically cold tumors into ‘hot’ ones may thus provide new strategies to enhance immunotherapy outcomes.

Oncolytic virotherapy represents a new generation of cancer immunotherapy. Oncolytic viruses can selectively infect and lyse cancer cells, thereby exposing tumor antigens and ultimately activating specific anti-tumor immunity. This process can facilitate the conversion of immunosuppressive cold tumors into hot tumors, enhancing the effectiveness of immunotherapy6. A novel oncolytic herpes simplex virus type 2 (OH2) has shown efficacy in various types of tumors, including gastric cancer, colorectal cancer, and osteosarcoma7. Moreover, OH2 has been demonstrated to activate systemic immunity and significantly remodel the tumor microenvironment8. However, in the early clinical studies of OH2, although increased T-cell infiltration was observed, upregulation of PD-L1 expression was also detected. The impact of oncolytic viruses on the tumor microenvironment is more complex than currently recognized9.

To comprehensively investigate how oncolytic viruses influence the prostate cancer TME, we applied an integrated multi-omics approach incorporating transcriptomics, metabolomics, and single-cell sequencing. This dataset provides a resource for investigating OH2-induced changes in the prostate cancer TME. Limitations of single-cell analysis, such as potential technical noise and cell type heterogeneity, should be considered when interpreting these data10.

Methods

Ethics statement

All operations were undergone in a specific pathogen-free laminar flow animal facility. After the introduction of the mice, they were housed and observed for one week before undergoing the relevant operations. In this study, two main subcutaneous tumor models in mice were utilized. Animal experimental procedures were by relevant institutional and national guidelines. The use of animals was approved by the Institutional Animal Care and Ethics Committee of Tongji Hospital (Approval number: TJH-202312027).

Oncolytic virus

OH2 was provided by Binhui Biopharmaceutical Co., Ltd. (Wuhan, China). OH2 was designed based the HG52 strain of herpes simplex virus type II and the ICP47 gene and ICP34.5 gene were deleted to reducing neurotoxicity to normal cells and immune evasion. The virus was titrated on Vero cells and stored at −80 °C. The titer was determined as 108 TCID50/mL.

Cells and Mice

The human prostate cancer cell lines 22Rv-1 (Boster Bio, Wuhan) and C4-2 (Sunell Biotechnology, Suzhou) were cultured in RPMI-1640 medium (Gibco, Thermo Fisher Scientific). The murine RM-1 prostate cancer cell line was grown in DMEM (Gibco, Thermo Fisher Scientific). All media were supplemented with 10% fetal bovine serum and maintained at 37 °C in a 5% CO2 humidified atmosphere. The 22Rv-1 and C4-2 human prostate cancer cell lines were selected as they represent models of androgen-responsive and castration-resistant prostate cancer, respectively. The murine RM-1 cell line was chosen for its ability to form tumors in immunocompetent C57BL/6 mice, allowing for the study of OH2-induced effects within a functional immune microenvironment.

For in vivo studies, male immunocompetent C57BL/6 mice and immunodeficient BALB/c nude mice (4–6 weeks old) were acquired from Vital River Laboratory Animal Technology (Beijing). These animals were housed under specific pathogen-free conditions with a 12-hour light/dark cycle and used for tumor xenograft experiments following a 7-day acclimatization period.

Sample preparation

Cell samples (transcriptomics and metabolomics) were collected after treatment with oncolytic virus OH2 (MOI = 5) for 24 hours. The metabolomics data were derived from cell culture supernatants rather than cell lysates (Fig. 1). Following OH2 treatment, cytopathic effects including cell rounding and detachment were observed under microscopy and CCK-8 assay was applied to prior to sample collection. Animal samples (transcriptomics and single-cell sequencing) were obtained from subcutaneous xenograft tumors derived from RM-1 cells, as detailed below:

Fig. 1
figure 1

Sample acquisition diagram.

RM-1 cells (1 × 105 per mouse) were collected and resuspended with PBS and inoculated into male C57BL/6 mice aged 4–6 weeks at the left armpit. After 7 days, the mice were randomized using a random number table into 2 groups (Negative control [NC] group and OH2 group) to keep the initial tumor volume close and the treatment was given. NC group received intra-tumoral injection (i.t.) of 100uL PBS on day 0, day 3, and day 6 (every 3 days for a total of 3 treatments). The OH2 group received 100uL OH2 (i.t., TCID50:108/mL) on day 0, day 3, and day 6 (every 3 days for a total of 3 treatments) (Fig. 1). Tumor volume was measured every two days using a digital caliper and calculated using the formula: Volume = (Length × Width²)/2.

Transcriptome Profiling

Total RNA was isolated from specimens using TRIzol reagent (Invitrogen, USA) following the manufacturer’s protocol. RNA purity and quantity were evaluated with a NanoDrop 2000 spectrophotometer (Thermo Fisher, USA), while integrity was confirmed via an Agilent 2100 Bioanalyzer (Agilent Technologies, USA). Sequencing libraries were prepared using the VAHTS Universal V6 RNA-seq Library Prep Kit according to the manufacturer’s instructions, followed by paired-end (150 bp) sequencing on an Illumina Novaseq 6000 platform. The raw sequencing data were quality-controlled using Fastp (version 0.23) to remove adapters and low-quality reads11. The obtained clean reads were then aligned to the mouse reference genome (mm10) or human reference genome (GRCh38) using Hisat2 (version 2.2.1)12. Gene expression levels were quantified and normalized using the FPKM method.

$${\rm{FPKM}}=\frac{{Counts}\times {10}^{9}}{{\rm{Effective\; Length}}\times {\rm{Total\; Mapped\; Reads}}}$$

The number of genes used for Principal Component Analysis (PCA) is all protein-coding genes after filtration and FPKM calculation. Euclidean distance metric and complete linkage clustering algorithm were used for correlation heatmap.

Dual-Modality Metabolomics Analysis

GC-MS Methodology

Metabolite separation was achieved using an Agilent 7890B GC system coupled with a 5977 A MSD (Agilent Technologies, USA). A DB-5MS capillary column (30 m × 0.25 mm × 0.25 μm) was employed with helium carrier gas (1 mL/min constant flow). The temperature gradient protocol was as follows: initial hold at 60 °C (0.5 min), ramp to 125 °C (8 °C/min), then to 210 °C (5 °C/min), 270 °C (10 °C/min), and finally 305 °C (20 °C/min; 5 min hold). MS parameters included: ion source (230 °C), quadrupole (150 °C), 70 eV collision energy, and m/z 50–500 scan range.

LC-MS/MS Methodology

Analysis was performed on a Waters UPLC I-Class plus system interfaced with a Q-Exactive HF mass spectrometer (Thermo Fisher, USA). Separation utilized a HSS T3 column (1.8 μm, 2.1 × 100 mm) with a gradient of 0.1% formic acid in water (A) and acetonitrile (B): 5% B (0–2 min), 30% B (4 min), 50% B (8 min), 80% B (10 min), 100% B (14–15 min), re-equilibration to 5% B (15.1–16 min). MS settings included: 70,000 resolution (full scan), 17,500 (MS/MS), spray voltages ±3.8/3.2 kV, and m/z 100–1,000 range.

Qualitative analysis

XCMS v4.5.1 preprocessing

Prior to pattern recognition, the metabolomics software XCMS v4.5.1 was employed for raw data preprocessing. This encompassed baseline filtering, peak detection, integration, and retention time correction.

MS-DIAL v4.24 analysis

For systematic small-molecule characterization, GC-MS data acquired in Full Scan mode were processed using MS-DIAL (version 4.90) for mass spectral deconvolution and compound identification/quantification. The analytical pipeline consisted of:

Data conversion

Raw.D format files were transformed into ABF format through AnalysisBaseFileConverter to optimize processing efficiency.

Feature extraction

The software algorithmically generated model peaks from chromatograms while suppressing background interference through adaptive noise reduction.

Compound annotation

Metabolite identification integrated three orthogonal parameters: Retention index matching MS/MS spectral similarity, Reference database alignment

Data processing workflow

Automated peak detection, Deconvolution of co-eluting compounds, Retention time alignment, Gap filling for missing values.

Database

Compound identification relies on multiple dimensions, including RT, exact mass, secondary fragments, and isotope distribution. We conducted identification analysis using the HMDB, Lipidmaps (v2.3), METLIN database, and LuMet Animal 3.0 local database.

LuMet Animal 3.0 is an LC-MS/MS database from OE Biotech Co., Ltd., built on standardized equipment and standard samples. It covers over 10,000 key metabolites, including 2,000 standard ones with RT, MS1, and MS2 data. It includes 13 major metabolite classes like amino acids, organic acids, and carbohydrates, spanning crucial metabolic pathways such as glycolysis, TCA cycle, and phenylalanine metabolism.

For GC-MS qualitative analysis, we used the LuMet GC 5.0 database from OE Biotech Co., Ltd. This untargeted GC-MS database contains 2,543 detectable metabolites and is continuously updated. Its mass-to-charge ratio ranges from 85 to 650, covering various substances like lipids and amino acids. The database includes HMDB, KEGG, and CAS numbers for easier functional research.

For headspace sampling and wax component detection in GC-MS experiments targeting volatiles, we used the NIST database (https://webbook.nist.gov/chemistry/) for qualitative and quantitative analysis. Primarily an EI-MS database, the latest NIST version also includes ESI MS/MS spectra of small-molecule compounds, such as metabolites, lipids, and bioactive peptides.

Data processing and analysis

LC-MS/MS data processing

The extracted LC-MS/MS data underwent rigorous preprocessing to ensure data quality and reliability. The steps included RSD filtering, missing value processing, zero-value replacement, score-based filtering, data merging, and log2 conversion. Ion peaks with an RSD exceeding 0.3 were removed to ensure consistency across samples. Ion peaks with over 50% missing (zero) values within a group were discarded. Remaining zero values were replaced with half of the minimum ion intensity across all samples. After addressing zero values, log2 transformation was applied to stabilize variance and normalize the data. Compounds identified through qualitative analysis were evaluated based on a scoring system (maximum 80 points). Those scoring below 36 were deemed unreliable and excluded from further analysis.

GC-MS data processing

For GC-MS data, internal standards were used for quality control. The preprocessing steps included the removal of internal standard peaks and known false positive peaks from the data matrix. Similar to LC-MS/MS, ion peaks with over 50% missing (zero) values within a group were removed, and remaining zero values were replaced with half of the minimum value. Log2 transformation was applied after zero-value processing to normalize the data. GC-MS data normalization was performed using internal standard segmentation normalization for all samples except cell/bacterial/algal samples. For cell/bacterial/algal samples, an additional step of total peak area normalization was performed after internal standard segmentation normalization. The algorithm was modified to calculate the normalized value as (sample peak area / total peak area) × mean total peak area of all samples. For each sample, the data was segmented and normalized based on the selected internal standard with RSD (ALL) <0.1 for all peak signal strengths (peak areas). The normalized value was calculated as (segmented sample peak area / segmented internal standard peak area) × average internal standard peak area of all segmented samples.

Score screening

After normalization, redundancy removal and peak merging were performed. Compounds were screened based on their qualitative scores. The screening criterion was raised to 70 points (out of 100 points). Compounds scoring below 70 were considered inaccurate and were removed, resulting in a refined data matrix.

Data merging

The processed LC-MS/MS positive and negative ion data, along with GC-MS data, were combined into a comprehensive data matrix. This matrix included all essential information extracted from the raw data, providing a robust foundation for subsequent analyses.

Level Score Description: LC-MS/MS:

Level 1: Retention time within ±0.3 min (18 s) and fragmentation score ≥ 45 points.

Level 2: Retention time within ±0.3 min (18 s) and 0 ≤ fragmentation score < 45 points.

Level 3: Fragmentation score ≥ 45 points.

Level 4: Fragmentation score < 45 points.

GC-MS:

Level 1: Score ≥ 85 points.

Level 2: 80 ≤ Score < 85 points.

Level 3: 70 ≤ Score < 80 points.

Single-Cell RNA sequencing

Single-cell libraries were constructed using the SeekOne®MM platform. Cells were gravity-sedimented into micro-wells (170,000/chip), followed by barcoded bead (CBB) capture via magnetic sedimentation. After cell lysis, bead-bound RNA underwent reverse transcription (37 °C, 30 min) and second-strand synthesis. Amplified cDNA was indexed (Illumina adapters) and sequenced on NovaSeq 6000 (PE150) or DNBSEQ-T7 (PE100) platforms.

Sequencing data quality control

Raw sequencing data underwent quality assessment and preprocessing using Fastp (v0.23.2) with the following optimized parameters:

Read trimming protocol

Implemented a 1-bp sliding window quality filter (5’ → 3’ direction); Base removal threshold: Phred score < 3 (per-window average); Automatic adapter detection for paired-end reads; Removal of terminal N nucleotides (TRAILING algorithm equivalent)

Read length filtering

Minimum length cutoff: 60 bp; Discarded truncated reads post-processing

Quality metrics

Calculated pre- and post-filtering base quality distributions; Generated per-cycle sequence content statistics; Reported adapter contamination levels.

Single-cell RNA sequencing data processing

Post-filtering of low-quality sequencing data, single-cell library quality control is essential. We used the single-cell library quality control software SeeKSoul®Tools for this purpose. Clean reads were assessed, aligned to the reference genome, and used for gene expression quantification to create a preliminary cell-gene expression matrix.

Each sequenced read has a barcode and UMI. Reads with the same barcode come from the same cell. Among these reads, those sharing the same UMI originate from the same molecule. The sequencing data, tagged with barcodes and UMIs, was aligned to the reference genome using the STAR aligner13. Then, featureCounts was used for expression analysis14. Reads with identical barcodes and UMIs, being duplicates of the same molecule, were merged during analysis to generate a preliminary expression matrix.

This initial matrix includes both cellular and background data. To obtain the final cell expression matrix, we filtered the preliminary matrix using methods inspired by Cell Ranger and EmptyDrops. First, we set an expected cell number (N, default 3,000). Barcodes were sorted by UMI count in descending order. Two filtering methods were applied: The first method took the 99th percentile of the top N UMI counts as the maximum estimated UMI count (m) and selected barcodes with UMI counts exceeding m/10 as captured cells. The second method distinguished cells from background based on RNA expression patterns, identifying low-RNA cells. The results from these two methods were combined to produce a comprehensive cell expression dataset for downstream analysis. The raw gene-expression matrix was processed with the DoubletFinder package (version 2.0.4) to filter out potential doublets.

Data Records

All data records generated in this study have been deposited in public repositories. The dataset comprises the following components: Transcriptomic and scRNA-seq raw data are accessible via GEO datasets (GSE298912, GSE298913, GSE298914, GSE298915)15,16,17,18. Metabolomics datasets are archived in NGDC OMIX (OMIX010418)19.

Transcriptomics Data (Bulk RNA-seq): GSE298912, GSE298913, GSE298914. This series contains raw FASTQ files and processed gene expression matrices for in vivo mouse tumor samples (RM-1, NC vs. OH2 group) and in vitro cell samples (22Rv-1 and C4-2, Control vs. OH2-treated). The processed data matrix includes a table where rows correspond to genes (id column) and columns correspond to samples. Expression values are provided in FPKM. The files are named FPKM_Tumorsamples.txt, FPKM_C42samples.txt, and FPKM_RV1samples.txt.

Single-Cell RNA Sequencing Data: GSE298915. This series contains the raw FASTQ files from the SeekOne®MM platform and the processed gene-cell count matrix. The filtered count matrix is provided in a standard format, containing the counts of unique molecular identifiers (UMIs) for each gene (rows) in each cell (columns). The processed data files are named filtered_feature_bc_matrix_nc_1.zip, filtered_feature_bc_matrix_nc_2.zip, filtered_feature_bc_matrix_nc_3.zip, filtered_feature_bc_matrix_OH_1.zip, filtered_feature_bc_matrix_OH_2.zip, and filtered_feature_bc_matrix_OH_3.zip.

Metabolomics Data: OMIX010418. This metabolomics dataset mainly includes raw data collected using gas chromatography-mass spectrometry (GC-MS) and liquid chromatography-tandem mass spectrometry (LC-MS/MS) techniques. The LC-MS/MS data comprises raw data files (.D format). The LC-MS/MS data comprises raw data files (.raw format) obtained in both positive ion mode (pos) and negative ion mode (neg). The data files are grouped by sample type, including cell samples (C_), mouse xenograft samples (R_), and quality control samples (QC).

Technical Validation

Transcriptome sequencing

RNA integrity assessment

Assessment of RNA quality and concentration was conducted with a NanoDrop 2000 spectrophotometer and Agilent 2100 Bioanalyzer.

RNA-seq raw data quality and clean reads mapping

After quality control, an average of 79.79 million high-quality reads per sample were obtained (Table 1). The alignment statistics to the reference genome are summarized in Table 2 and Fig. 2a.

Table 1 Summary of bulk RNA-seq sequencing data and quality control metrics.
Table 2 Genome mapping statistics for bulk RNA-seq data.
Fig. 2
figure 2

Bulk transcriptome sequencing. (a) Each sector represents the number of genes in a sample, radiating from a central pole. Different colors are used to differentiate the ranges of gene numbers. (b) Different colors indicate different ranges of FPKM values. The x-axis represents the samples, while the y-axis shows the number of protein-coding genes. (c) In PCA analysis, the closer the distance between samples, the higher their similarity. Samples from different groups are distributed in distinct regions of a two-dimensional space, while samples within the same group tend to cluster together in space. (d) The x-axis and y-axis represent the sample names. The color intensity indicates the distance between samples, with darker colors representing closer distances.

Gene expression level and sample correlation analysis

The expression values of the samples can be divided into different ranges. The number of genes in each expression range can be calculated and visualized using a stacked bar chart (Fig. 2b). Specifically, the analysis for the 22Rv-1 dataset included 27,673 genes, the C4-2 dataset included 29,706 genes, and the RM-1 in vivo tumor dataset included 23,624 genes. By using the quantification results of genes for Principal Component Analysis (PCA), we can examine the distribution of samples, explore the relationships between samples, or verify the experimental design (Fig. 2c). By employing clustering methods to calculate the distances between samples, the similarity among samples can be assessed (Fig. 2d).

Metabolomics sequencing

LC-MS/MS and GC-MS data were evaluated using different level scores to assess the quality of compound identification (Fig. 3a, c). Metabolite classification statistics are shown in Fig. 3b and Fig. 3d. A Boxplot was created for the metabolite intensity of the samples, as shown in Fig. 3e and Fig. 3f. The height of the box reflects the degree of data fluctuation to some extent. By using PCA, we can examine the distribution of samples, explore the relationships between samples, or verify the experimental design (Fig. 3g,h).

Fig. 3
figure 3

Metabolomics sequencing. (a) Statistical bar chart of metabolite quantities corresponding to different levels for cell lines. (b) Pie chart displaying statistical graphs of different metabolite classification ratio for cell lines. (c) Statistical bar chart of metabolite quantities corresponding to different levels for subcutaneous transplant tumor. (d) Pie chart displaying statistical graphs of different metabolite classification ratio for subcutaneous transplant tumor. (e & f) The x-axis represents the sample ID, while the y-axis represents the expression level of the samples. The plot mainly reflects the distribution characteristics of metabolite abundance values in each sample and is used to assess the consistency of quality control (QC) and the repeatability within groups. (g & h) In PCA analysis, the closer the distance between samples, the higher their similarity. Samples from different groups are distributed in distinct regions of a two-dimensional space, while samples within the same group tend to cluster together in space.

Single-cell RNA Sequencing

The sequencing quality control results are shown in Table 3. The quality control results of single-cell library are shown in Table 4. A total of 3,422 cells were excluded during the quality control process based on the following criteria: mitochondrial gene content >20%, fewer than 300 or more than 7,000 detected genes, or UMI counts below 10,000. (Fig. 4a). After preprocessing, 61,713 high-quality cells were retained. To estimate the effects of cell cycle heterogeneity, the expression of S or G2/M phases genes was scored and regressed out (Fig. 4b).

Table 3 Single-cell RNA sequencing sample-level sequencing metrics.
Table 4 Single-cell RNA sequencing library quality control and cellular expression summary.
Fig. 4
figure 4

Single-cell RNA Sequencing. (a) Violin plots showing the percentage of mitochondrial genes, hemoglobin genes and number of unique genes per cell, split by sample. nFeature_RNA: This plot shows the distribution of the number of unique RNA features detected in each cell across different samples. The width of the plot at each point represents the density of cells with a specific number of features. nCount_RNA: This plot illustrates the distribution of the total number of RNA counts per cell, indicating the overall RNA expression level in each cell. percent.mt: This plot displays the percentage of mitochondrial RNA in each cell, which can be an indicator of cell stress or quality. percent.HB: This plot shows the percentage of hemoglobin RNA in each cell, which is relevant for blood cell analysis. (b) Scatter plot displaying the proportion of cells in different cell cycle phases. This scatter plot uses PCA to visualize the distribution of cells according to their cell cycle phases. Different colors represent different cell cycle phases, helping to identify how cells are distributed across these phases. (c) Scatter plot displaying the distribution of cells in different samples. This scatter plot also uses PCA to show the distribution of cells across different samples. Each color represents a different sample, allowing for the visualization of sample-specific clustering or mixing. (d) Cell clustering of different samples after dimensionality reduction using the UMAP method. This plot uses UMAP to reduce the dimensionality of the data and visualize cell clustering for each sample separately. Each color represents a different sample, showing how cells from the same sample cluster together. (e) Cell clustering of all samples after dimensionality reduction using the UMAP method. This plot combines all samples and applies UMAP to visualize the overall cell clustering. Different colors represent different samples, showing how cells from various samples mix or cluster together. (f) Cell annotation results of all samples after UMAP dimensionality reduction. This plot shows the results of cell annotation for each sample using UMAP. Different colors represent different cell types identified within each sample, providing insights into the cellular composition of each sample. (g) Dot plot displaying marker genes for cell annotation. This dot plot displays marker genes used for cell annotation. The size of the dot indicates the percentage of cells expressing the gene, and the color intensity represents the average expression level of the gene across cells expressing it. Different colors represent different cell identities.

To integrate the six samples, 3,000 integration features were used and applied harmony (version 1.2.3) for batch correction (Fig. 4c). The clustering was performed at a resolution of 0.8 reduction was performed via the Umap algorithm (Fig. 4d,e). Cell identity was determined using differential expressions genes by package COSG (version 0.9.0) and marker genes from reliable researches (Fig. 4f,g and Table 5).

Table 5 Canonical marker genes used for cell type annotation.