Methods

Sample selection

Tumor samples were obtained from retinoblastoma patients undergoing primary enucleation (PE). No patients received prior systemic or intraocular therapy. Enucleation status and clinical details are provided in Supplementary Table 1.

Data acquisition and preprocessing

The single-cell RNA sequencing (scRNA-seq) dataset of retinoblastoma, comprising 10 individual RB samples, was obtained from the Gene Expression Omnibus (GEO) under the accession numbers GSE249995 and PRJNA737188 (https://www.ncbi.nlm.nih.gov/geo/). Additionally, bulk RNA-seq data were retrieved from GEO under the accession number GSE57362.

Clustering, dimension reduction, and cell annotation of scRNA-seq

The processing of single-cell RNA sequencing data was performed using the “Seurat” R package (version 4.2.0)1. The dataset was normalized via the log-normalization method. And the highly variable genes (top 2000) were identified with the “FindVariableFeatures " function using the R package Seurat. Subsequently, the data were scaled using the “ScaleData” function, and to mitigate potential batch effects, we employed the “RunHarmony” function2. After this, dimensionality reduction was performed through principal component analysis (PCA) with the “RunPCA” function, focusing on highly variable genes. The first 20 principal components (dims = 20) were selected for downstream clustering, which was achieved using the “FindNeighbors” and “FindClusters” functions at a resolution of 0.5. To identify marker genes across 10 identified subgroups, the “FindAllMarkers” function was employed with thresholds of log2 fold change > 0.25 and a minimum percentage (min.pct) of 0.25. Marker genes were filtered using a corrected p-value threshold of < 0.05.

Sub-clustering of CP cell subpopulations

To further characterize the heterogeneity within the CP (Cone Precursor cells) cluster, we performed sub-clustering analysis using the Seurat R package. First, cells within the CP cluster were extracted and re-processed following the standard Seurat workflow, including “NormalizeData”, identification of highly variable features (FindVariableFeatures vst = 2000), and “ScaleData”. Principal component analysis (PCA) was then performed, and the top principal components were used for UMAP dimensionality reduction. Clustering was carried out using the “FindNeighbors” and FindClusters = 0.4 functions with an appropriate resolution parameter to identify finer subpopulations.

Identification of the malignant cone precursor cells

The “InferCNV” package (version 1.6.0) was employed to infer copy number variants (CNVs) in CP (Cone Precursor) cells using immune cells as the reference group. The analysis was performed with default parameters. To estimate the CNV signal for individual CP cells, a 100-gene sliding window approach was applied. Prior to analysis, genes with a mean count of less than 0.1 across all cells were filtered out to reduce noise. The CNV signal was further denoised using a dynamic threshold derived from the mean signal intensity. Based on the CNV accumulation scores, CP cells were classified into distinct groups to identify potential malignant or abnormal cell populations.

We calculated a CNV score for each cell based on the “inferCNV” output matrix. Specifically, we first subtracted 1 from each CNV value to center the data, then calculated the average of the absolute values across all genes for each cell. This CNV score reflects the extent of deviation from diploid status, thus providing a quantitative measure of chromosomal abnormality.

Pseudo-time trajectory analysis

The “CytoTRACE” is a computational tool that estimates the developmental potential of single cells by measuring transcriptional diversity. Cells with higher “CytoTRACE” scores are inferred to be less differentiated and more progenitor-like. This approach facilitates the reconstruction of cellular differentiation trajectories and is especially useful for identifying immature cell populations within complex tissues. Pseudo-time trajectory analysis was performed to investigate the developmental dynamics of cone precursor cells. The DEGs were first identified using the Seurat package. The R package Monocle (version 2.4) was subsequently used to construct the pseudo-time trajectory using default parameters3. Pseudo-time ordering was performed using the “ReduceDimension” function, with the maximum number of components set to 2 and the reduction method specified as “DDRTree”. To identify genes significantly associated with developmental progression, the “differentialGeneTest” function was employed.

Cell–cell communication analysis

Intercellular communication was investigated using “CellPhoneDB” (version 2.0.0), which computes the significance of cell–cell interactions by analyzing ligand-receptor pairs based on normalized expression matrices and permutation testing4. ligand–receptor interactions with nominal p values < 0.05 were retained for further analysis. To explore interactions in greater depth, the “NicheNet” framework was applied. This approach links ligands expressed in one cell type to target genes expressed in another, enabling the identification of key signaling pathways influencing specific cellular behaviors.

Comparison of cell–cell interactions in invasive versus non-invasive RB

Differential ligand–receptor interactions between high-risk and invasive RB samples were assessed using the Mann–Whitney U test. Interaction scores for each ligand-receptor pair across all cell types were compared between pre-treatment response groups. To account for multiple hypothesis testing, adjusted p values were calculated using the false discovery rate (FDR) method proposed by Benjamini and Hochberg. Ligand-receptor pairs with adjusted p values less than 0.05 were considered statistically significant.

Molecular gene classifications of the RB patients

The “ConsensusClusterPlus” R package (version 1.72.0) was applied to explore the molecular classification of bulk RNA-seq cohorts based on the expression pattern5. In unsupervised consensus clustering based on k-means machine learning, 80% of the data were iterations of the 1,000-iter clustering algorithm. The relative change in the consensus score under the CDF curves was used to calculate the ideal number of clusters. The prognosis of patients between two subgroups was then assessed using a Kaplan–Meier survival analysis. In order to further investigate the relationships between the ligand-receptor pairs gene and the clinical characteristics, we carried out comparisons of the histopathological features between two subgroups.

Gene set enrichment analysis and functional annotation

Differentially expressed genes (DEGs) were analyzed for functional enrichment using the R package “ClusterProfiler”. The Gene Ontology (GO) analysis covered three main domains: cellular components (CC), biological processes (BP), and molecular functions (MF). Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment was also conducted to identify key pathways associated with the DEGs. To explore pathways linked to different molecular subtypes, Gene Set Enrichment Analysis (GSEA) was performed. The analysis utilized comprehensive gene sets from the Molecular Signature Database (MSigDB), enabling the identification of enriched pathways across the subtypes.

Cell-type and immune infiltration assessment between subgroups

To accurately estimate the proportions of various cell types within tissues, cell type deconvolution was performed using the “CIBERSORT” algorithm, which analyzes input gene expression matrices to infer immune cell compositions. Variations in immune cell proportions across different groups were compared to identifying distinct immune profiles. To explore the association between risk scores and infiltrating immune cells, Spearman correlation analysis was conducted. The results of the correlation analysis, along with the differences in immune cell abundances, were visualized using the “ggplot2” package in R.

Weighted gene co-expression network analysis and module gene selection in invasive vs. non-invasive

WGCNA was used to build gene co-expression networks and identify modules distinguishing invasive and non-invasive RB6. Outlier samples were removed using “goodSamplesGenes” and a scale-free network was constructed from the top 10,000 highly expressed genes. The optimal soft threshold (β) was determined with “pickSoftThreshold”. Modules were identified via “blockwiseModules”, and sample clustering was visualized with “plotDendroAndColors”. Module correlations were shown with a labeled heatmap, and associations between modules were illustrated with plotEigengeneNetworks. Gene significance (GS) and module membership (MM) correlations were calculated, and module-specific genes were extracted for further analysis.

Statistical analysis

The R programming language was used for all statistical studies (version 4.2.0). For the correlation analysis, Spearman’s correlation was used. To examine the differences between these two risk groups, the Wilcoxon test was applied. Statistical significance was defined as < 0.05.

Cell culture and cell transfection

The Y79 cell line (American Type Culture Collection, ATCC) was maintained in RPMI-1640 medium supplemented with 10% fetal bovine serum (FBS) and 1% penicillin/streptomycin (P/S) under standard culture conditions (37 °C, 5% CO₂). For transfection experiments, cells were seeded in 6-well plates at a density of 3 × 10⁵ cells/well and allowed to adhere for 24 h before performing transient transfection using Lipofectamine 2000 (Invitrogen, Carlsbad, CA, USA) according to the manufacturer’s protocol. Three distinct DOK7-targeting siRNA sequences were employed:

  • 5′-GCAATGATGTCCTCGTCTTGG-3′ (siRNA#1),

  • 5′-GCCAAGCGGATTCATCTTTGA-3′ (siRNA#2),

  • 5′-GCTGATGCTGGTCTACAAGGA-3′ (siRNA#3).

Quantitative polymerase chain reaction

Total RNA was extracted from cells using TRIzol reagent and reverse-transcribed into cDNA. qPCR was performed using SYBR Green, with triplicate reactions under standard cycling conditions. Data were normalized to GAPDH using the ΔΔCt method.

The CCK8 colony formation, transwell assays

Cells were reseeded in 96-well plates (3 × 10³ cells/well) and cultured for 0, 24, 48, and 72 h. Cell Counting Kit-8 reagent (Dojindo Laboratories, Kumamoto, Japan) was added (10 µL/well) followed by 2-h incubation at 37 °C. Absobance at 450 nm was measured using a microplate reader (BioTek Instruments, USA), with blank medium-containing wells serving as reference.

Transfected cells were plated in 6-well plates (500 cells/well) and maintained in complete medium for 10–14 days with medium renewal every 3 days. Resultant colonies were fixed with 4% paraformaldehyde (15 min), stained with 0.1% crystal violet (30 min), and quantified using ImageJ software (NIH, USA).

For migration assessment, 1 × 10⁵ cells in serum-free medium were seeded into upper chambers (8-µm pores, Corning Costar, USA). For invasion evaluation, chambers were pre-coated with Matrigel (1:8 dilution, BD Biosciences). Complete medium containing 10% FBS served as chemoattractant in lower chambers. After 24-h incubation, migrated/invaded cells were fixed, stained, and counted from five random fields per membrane using phase-contrast microscopy (200 × magnification).

Introduction

Retinoblastoma (RB) is a rare but highly aggressive pediatric ocular malignancy, originating from the retinal precursor cells7,8. It remains a leading cause of childhood blindness and death if left untreated. Retinoblastoma is classified into two main forms: invasive and non-invasive9. Invasive tumors are characterized by their ability to spread beyond the retina, while non-invasive tumors remain confined to the ocular region10. The disparity in clinical outcomes between these two forms is largely attributed to differences in tumor biology, with invasive retinoblastomas exhibiting a more aggressive phenotype, higher rates of metastasis, and poor prognosis. Despite significant advances in therapeutic approaches, the molecular and cellular mechanisms driving tumor invasiveness and metastasis remain poorly understood8.

RB is clinically categorized into noninvasive and invasive subgroups based on tumor behavior11. With non-invasive cases characterized by tumors confined to the retina and subretinal space, sometimes involving the vitreous, while invasive cases exhibit tumor infiltration into the choroid, optic nerve, sclera, or beyond, with a significant risk of metastasis. Over recent decades, various treatment strategies, including enucleation, cryotherapy, brachytherapy, and systemic, intraarterial, or intravitreal chemotherapy, have been developed to address RB12,13. Patients with invasive RB are at heightened risk of tumor metastasis, which severely impacts survival outcomes, emphasizing the critical need for early diagnosis to enable timely intervention and optimize treatment strategies to preserve vision, the eye, or life itself. Therefore, identifying distinct characteristics of invasive RB compared to noninvasive cases using noninvasive screening methods is essential for developing early diagnostic and prognostic markers that can improve patient management and outcomes14.

In recent years, the tumor microenvironment (TME) has gained increasing attention as a critical factor in tumor progression13,15,16,17. The TME is composed of a variety of cell types, including tumor cells, immune cells, fibroblasts, and endothelial cells, each contributing to the tumor’s growth, invasiveness, and therapeutic resistance. In retinoblastoma, however, the cellular and molecular landscape of the TME, particularly in relation to invasive and non-invasive disease, remains underexplored18,19,20,21. The integration of single-cell RNA sequencing (scRNA-seq) and bulk RNA-seqpresents a promising strategy to elucidate the complex cellular interactions within the TME and identify key drivers of retinoblastoma progression22,23,24.

Bulk RNA-seq lacks the ability to resolve cellular-level heterogeneity. By integrating scRNA-seq and bulk RNA-seq, we can combine the high-resolution cellular insights of scRNA-seq with the comprehensive gene expression data of bulk RNA-seq25,26. This approach allows us to build a more complete picture of the TME, shedding light on the molecular differences between invasive and non-invasive retinoblastoma and identifying novel targets for therapeutic intervention.

One of the unique aspects of retinoblastoma is its origin in retinal precursor cells, which can differentiate into various retinal cell types, including cone photoreceptor cells. Cone cells are primarily responsible for color vision and visual acuity, and they represent a crucial component of the retina’s architecture. Recent studies suggest that cone cells, particularly in the context of retinoblastoma, play a critical role in tumor progression18,20. Invasive retinoblastomas often exhibit malignant transformation of cone precursor cells, leading to enhanced proliferation and metastatic potential. These cone-derived tumor cells may contribute to the aggressive phenotype of invasive tumors, altering the local microenvironment to promote invasion and resistance to treatment16,27,28.

This study hypothesizes that cone cells in the retina play a pivotal role in the malignant progression and invasiveness of retinoblastoma. It aims to examine the specific transcriptional signatures of cone cells within the tumor microenvironment (TME), comparing their presence and activity in invasive versus non-invasive retinoblastoma samples. Understanding how cone cells influence tumor behavior may uncover novel mechanisms underlying tumor progression, including the modulation of immune cell infiltration, stromal remodeling, and angiogenesis. Furthermore, targeting cone cell-derived pathways may provide novel therapeutic strategies for inhibiting tumor progression and improving clinical outcomes for patients with invasive retinoblastoma.

In this study, we aim to integrate scRNA-seq and bulk RNA-seq data to comprehensively characterize the tumor microenvironment in invasive and non-invasive retinoblastoma samples. By focusing on cone cells and their malignant progression, we aim to elucidate the critical molecular pathways that govern retinoblastoma invasiveness. Specifically, we will examine the differential expression of genes associated with cone cell differentiation, immune cell interactions, and extracellular matrix remodeling within the tumor microenvironment (TME). This integrative approach is expected to not only enhance our understanding of retinoblastoma biology but also identify key biomarkers and potential therapeutic targets that may be leveraged to improve treatment strategies for this challenging disease.

Result

Diverse cell types in retinoblastoma (RB) were characterized using scRNA-seq analysis

We analyzed the scRNA-seq data from the GEO database (GEO accession number: GSE249995), which included primary tumor lesions and adjacent normal tissues from 10 RB patients, to explore the tumor microenvironment (TME) (Fig. 1A)18,20. These 10 samples encompassed different RB subtypes, including 4 invasive RB, 6 non-invasive RB. After quality control and doublet removal, a total of 90,316 cells were retained for analysis1,2,29. Specifically, we obtained 41,753 cells from invasive RB and 48,563 cells from non-invasive RB patients (Fig. 1B). The relative proportion and number of cells in each cluster are depicted in Fig. 1C. Our analysis revealed that nearly all cell populations were present in both non-invasive and invasive RB, though their distribution varied significantly across patients (Fig. 1C, D). To annotate the cell types, we used classic marker genes identified in previous studies. Epithelial cells expressed high levels of PSCA and SPINK1, while T cells were characterized by elevated levels of CD3E. KRT5 + epithelial cells displayed high expression of KRT5, and endothelial cells expressed ACKR1. And fibroblast expressed DCN gene within the single-cell transcriptome. As shown in Fig. 1E, the proportions of Cone Precursors (CP) cells, microglial cells, and MKI67 + cone precursors were significantly higher in invasive RB patients than non-invasive RB patients (Fig. 1E). These cell populations may play crucial roles in shaping the TME in invasive RB patients. These findings suggest that variations in cellular composition of RB, the CP cells might correlate with mechanisms of tumor invasion and metastasis.

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

Single-cell transcriptomic profiling of invasive versus non-invasive RB. A A Schematic diagram showing the experimental workflow of this study. Tumor of the invasive versus non-invasive RB were collected from 10 patients. B UMAP plot of single cells distributed by unbiased hierarchical clustering. colored by patients. colored by invasive versus non-invasive RB. C Fraction of cells in invasive and non-invasive RB. D Fraction of cells in each cell subpopulations. E Dotplot showing the Ro/e of cluster occurring in each tissue. Ro/e> 1 indicates that the cluster is preferred to distribute in the invasive RB.

Aberrant gene expressions in cone precursor cells between invasive and non-invasive RB

Invasive RB is associated with a higher likelihood of metastasis to distant organs, which significantly impacts patient survival rates. Therefore, early diagnosis of invasive RB is crucial to enable timely intervention and the implementation of optimal treatment strategies, ultimately improving survival outcomes30. Previous studies have suggested that RB originates from maturing cone precursor cells (CPs). It is important to note that invasive RB often evolves from non-invasive forms7,10. However, the cell type composition of non-invasive RB, especially in non-invasive lesions, remains poorly characterized. Recent research has indicated that CPs represent one of the most abundant cell types in RB. To explore the heterogeneity of epithelial cell types during retinoblastoma differentiation, we performed unsupervised clustering analysis to subclassify the precursor cells. We identified six distinct subtypes of CPs, each characterized by distinct marker genes. (Fig. 2A). The proportion of cells in the CP cell subpopulation was different in the invasive vs. non-invasive groups. The CP5 cluster was exclusive to the invasive group, while CP6 was unique to the invasive group, reflecting significant sample heterogeneity and suggesting distinct roles in tumor (Fig. 2B). The “inferCNV” analysis further revealed significant differences in copy number variation between epithelial subtypes derived from malignant cells and normal tissue epithelial cells, suggesting functional heterogeneity of epithelial cells within the tumor (Fig. 2C). These findings emphasize the critical role of epithelial cells in RB. To further elucidate the functional characteristics of the CP subpopulations, Gene Set Variation Analysis (GSVA) was conducted. The results revealed distinct pathway activities among the subpopulations: the CP4 subpopulation exhibited significantly elevated activity in the TGF-β signaling pathway, while the CP3 subpopulation showed increased activity in pathways associated with hypoxia and G2M checkpoint regulation. In contrast, the CP2 subpopulation exhibited reduced activity in these pathways but displayed pronounced activation of the IL6-JAK-STAT3 pathway, suggesting a unique immunoreactive profile. These findings highlight the functional heterogeneity within the CP subpopulations and their potential contributions to shaping the tumor microenvironment (Fig. 2D). To better understand the functional differences in cell types between non-invasive RB and invasive RB, we identified both upregulated and downregulated genes of CP cells between them. We observed that CP4 exhibited more cell type specific upregulated DEGs than downregulated DEGs (Fig. 2E). The upregulated DEGs were primarily enriched in pathways associated with RNA splicing and oxidative phosphorylation, whereas the downregulated DEGs were involved in regulating pathways such as ribosome biogenesis, RNA catabolic processes, and cytoplasmic translation (Fig. 2F, G). This contrasting pattern highlights distinct functional alterations in cellular processes between the compared conditions.

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

Aberrant Gene Expression Profiles in CPs cell manners in the RB. A The UMAP visualization of CP cells subpopulations. B Dot plot of the mean expression of marker genes for 6 major lineages from CPs. C Proportion of subpopulation CPs in invasive versus non-invasive RB. Proportion of subpopulation CPs in samples. D Heatmap showing large-scale CNVs in epithelial cells from representative RB samples. The normalized CNV levels are shown; red represents a high CNV level, and blue represents a low CNV level. E GSVA was conducted to perform pathway enrichment analysis in CP cell subpopulations. F Heatmaps showing the distribution of upregulated (red) gene, the upper part represents the DEGs shared by at least two cell types, the lower panel represents the unique DEGs of each cell type, and the number below the dotted box represents the number of unique DEGs in each cell type. The enrichment pathways of the DEGs of each cell type. G Heatmaps showing the distribution of downregulated (blue) gene, the upper part represents the DEGs shared by at least two cell types, the lower panel represents the unique DEGs of each cell type, and the number below the dotted box represents the number of unique DEGs in each cell type. The enrichment pathways of the DEGs of each cell type.

Heterogeneity of CP cells during the invasion and metastasis of RB patients

To explore the developmental trajectories of CP cell subpopulations, we organized them based on their differentiation states. Using “CytoTRACE”, we identified CP4 as the initiation state with higher differentiation potential, while CP2 and CP6 represented terminal states in the differentiation trajectory (Fig. 3A, B). Pseudo-time analysis revealed that CP1 acted as a transitional state between CP2 and CP6 clusters (Fig. 3B). Notably, CP cells from some patients clustered closely at the end of the branches, indicating a homogeneous terminal phenotype, whereas others exhibited diverse and heterogeneous profiles along the cancer developmental trajectories. Further analysis using Monocle2 elucidated the developmental trajectory of malignant CP cells, demonstrating that different clusters occupied distinct stages of differentiation (Fig. 3C, D). KEGG pathway analysis revealed distinct functional enrichments in the four clusters. Cluster 1 was enriched in pathways related to cytoplasmic translation, ribonucleoprotein complex biogenesis, ribosomal biogenesis, and protein-RNA complex assembly, indicating a strong association with protein synthesis and ribosome-related processes31. Cluster 2 was enriched in DNA repair, chromosome segregation, DNA replication, and regulation of chromosome organization, highlighting its role in genomic stability and cell division. Cluster 3 was associated with pathways including neuron projection development, cell growth, and regulation of cell projection organization, suggesting involvement in neuronal development and morphogenesis. Finally, Cluster 4 exhibited enrichment in cellular respiration, oxidative phosphorylation, and energy precursor generation, emphasizing its role in energy metabolism and ATP production. These findings underscore the functional diversity and specialization among the clusters, highlighting their distinct roles in cellular processes (Fig. 3E). In summary, this comprehensive analysis highlights the dynamic developmental trajectories and functional heterogeneity of CP cell subpopulations, providing valuable insights into their roles in tumor progression and potential therapeutic targets.

Fig. 3
Fig. 3The alternative text for this image may have been generated using AI.
Full size image

Defining cone precursor cells by gene expression. A, B The “CytoTRACE” predicts the cell differentiation potential of malignant CPs in RB. C A Trajectory order of the macrophage populations according to the pseudo-time value. D Developmental trajectories of CP cells of each sample. E Clustered heatmap of differential gene trajectory pseudo-time values (P < 0.01). Display the differential genes and KEGG pathway enrichment results of each cluster simultaneously.

Comparison of cell–cell interactions in the TME between invasive and non-invasive RB patients

To further investigate potential cell–cell interactions within the TME, we employed the “CellChat” software16,28. Distinct differences in cell–cell communication patterns were observed between invasive and non-invasive RB patients. For instance, in non-invasive RB patients, CP6 predominantly interacted with cone cells, whereas in the invasive RB, cone cells showed stronger interactions with fibroblasts (Fig. 4A). To quantify these differences, we calculated the ratio of total interaction frequencies for each cell pair between the two groups (Fig. 4B). Notably, the interaction ratios between invasive and non-invasive RB revealed that Rod/CP5 and retinoma-like cell/CP5 interactions were the most upregulated, while fibroblasts/CP6 interactions were the most downregulated. These findings highlight the dynamic rewiring of cell–cell communication networks during RB progression and suggest potential mechanisms underlying tumor invasiveness. (Fig. 4B). To delve deeper into the molecular mechanisms driving changes in cellular interactions within the TME of invasive and non-invasive RB, we utilized the “NicheNet” R package. This tool enabled us to explore ligand-receptor crosstalk by analyzing expression profiles and downstream targets of ligand-receptor pairs. Specifically, we employed the “nichenet_seuratobj_aggregate” function to conduct differential cell communication analysis and reconstruct receptor-ligand regulatory networks28. In our analysis, we designated immune cells and stromal components—such as fibroblasts, retinoma-like cells, rod cells, neural cells, and cone cells—as the sender population to map out cell–cell regulatory networks. Our results revealed a multitude of significantly differentially expressed ligand-receptor pairs between invasive and non-invasive tumors, including BMP7–BMPR2, EFNA5–EPHA8, and LAMB1–CADM3 (Figs. 4C, S1). This comprehensive characterization of ligand-receptor interactions (LRIs) in low- and non-invasive RB underscores the pivotal role of gene expression dynamics among diverse cell types within the TME in the initiation and progression of RB. The KEGG pathway enrichment analysis revealed significant pathways associated with ferroptosis, including the p53 signaling pathway, cellular senescence, apoptosis, and cell cycle. Additionally, key signaling pathways such as JAK-STAT, HIF-1, and PI3K-AKT were identified (Fig. 4D). We examined the expression of receptor–ligand pair genes across different cell types. For example, BMP7 was predominantly expressed in CP cells and neural cells, whereas CD99 was mainly expressed in a subset of CP cells (Fig. 4E). These findings collectively underscore the critical role of dynamic ligand–receptor interactions and signaling pathway modulation in shaping the tumor microenvironment and driving retinoblastoma progression.

Fig. 4
Fig. 4The alternative text for this image may have been generated using AI.
Full size image

The cell–cell crosstalk network between invasive versus non-invasive RB. A Detailed network of cell–cell interactions among eleven cell subsets. Comparison of invasive versus non-invasive RB. B Dot plots depict the changed numbers of putative ligand–receptors in invasive RB compared with invasive versus non-invasive RB (red, increased; blue, decreased). C By comparing differences between invasive versus non-invasive RB cell interaction receptors and ligands in several key cell types. (a) Ligand activity prediction by “NicheNet” algorithms shows the top 20 ligands best predicting all DEGs between sender and receiver cells. (b) The bubble plot shows the expression patterns of the predicted ligands on other immune cells. (c) Ligand–target matrix displayed in RB. D Analysis of KEGG enrichment of target genes downstream of receptor-ligand regulation. E The receptor-ligand gene expression in the invasive and non-invasive RB patients.

Molecular characteristics of different molecular subtypes of RB based on gene expression

Bulk RNA-seq captures the average gene expression across all cells, while scRNA-seq enables precise cell-type-specific transcriptome profiling and the identification of aberrant gene expression at the resolution of individual cells1,16,28. From the DEGs, we pinpointed genes with statistically significant associations for survival analysis, encompassing tissue-specific genes, cell-type-specific aberrant genes, and pseudo-temporal differentiation genes tracing the transition from normal to tumor cells (Fig. 5A). By evaluating the cumulative distribution function (CDF) and delta area, we identified k = 2 as the optimal clustering parameter, yielding two robust molecular subtypes with distinct transcriptional programs (Fig. 5B–D). Differential gene expression analysis was conducted between the two clusters, which led to the identification of a large set of DEGs), suggesting marked transcriptional differences between these cell populations (Fig. 5E, F). Differential expression analysis revealed that subtype 1 was predominantly enriched in proliferative and biosynthetic pathways, including cell cycle, DNA replication, ribosome, spliceosome, ferroptosis, and PI3K–AKT signaling, whereas subtype 2 was associated with immune and microenvironmental signaling, showing enrichment in focal adhesion, ECM–receptor interaction, Cytokine–Cytokine receptor interaction, Toll-like receptor, NF-κB, TNF, HIF-1, and T-helper cell differentiation pathways (Fig. 5G, H). Together, these findings suggest that the common DEGs identified at both bulk and single-cell resolutions not only serve as potential prognostic biomarkers for RB patients but also reflect an immunosuppressive TME, offering insights into the molecular mechanisms driving tumor heterogeneity and progression.

Fig. 5
Fig. 5The alternative text for this image may have been generated using AI.
Full size image

Identification and validation of the DEGs-based classification of RB patients. A Workflow for CP-related gene set construction and molecular subtyping. BD Consensus clustering matrix for k = 2, which was the optimal cluster number in the bulk transcriptome. E The differential expressed gene between the two subtypes was divided by the Consensus clustering method. F Chord diagram showing the ligand–receptor interactions between each cell category. G GSEA of the pathway of up-regulated gene. H GSEA of the pathway of down-regulated gene.

The immune landscape of RB patients in different subtypes

We investigated the relationships between patient subgroups and immune infiltration in RB patients. To do this, we used the “CIBERSORT” method for cell-type deconvolution and applied the Wilcoxon test to compare the distribution of 22 infiltrating immune cell types across different RB molecular subgroups (Fig. 6A, B). Notably, we found that, except for central memory CD8 + T cells and Treg cells, the distribution of most immune cell types significantly differed between subgroups. Subtype 1 was enriched in regulatory T cells and M2 macrophages, consistent with an immunosuppressive tumor microenvironment (TME) (Fig. 6C, D). In addition, several ICD-related genes (e.g., CXCL10, FPR1, HGF, IFNAR1, PARX7, TLR4, ANXA1, TLR3) were significantly upregulated in subtype 1 (Fig. 6E). However, this upregulation did not correspond to enhanced immune activation, suggesting that ICD-related signaling may be insufficient to overcome the suppressive milieu of subtype 1. By contrast, subtype 2 exhibited higher expression of immune checkpoint molecules, including CD200, CD70, IDO1, LAG3, NRP1, PDCD1, PDCD1LG2, TIGIT, TNFRSF25, TNFRSF4, TNFRSF8, TNFRSF9, TNFSF4, and VSIR (Fig. 6F), indicative of checkpoint-mediated immunoregulation. Taken together, these findings demonstrate that subtype 1 is characterized by an immunosuppressive TME, whereas subtype 2 may be shaped by checkpoint-dependent immune regulation.

Fig. 6
Fig. 6The alternative text for this image may have been generated using AI.
Full size image

The immune landscape in distinct molecular patterns in RB patients. A The “CIBERSORTx method” analysis identifying the relative infiltration level of immune cell populations in two subgroups of RB samples. B Differences in cell types of proportion among distinct molecular pattern subgroups in cohorts. C The “CIBERSORTx” method analysis identifying the relative infiltration level of immune cell populations in two subgroups of RB samples in the TCGA cohort. D Differences in cell types of proportion among distinct molecular pattern subgroups cohort 2. E Association between immune subtypes ICD modulators. Differential expression of ICP genes among the RB immune subtypes. F Association between immune subtypes ICP modulators. Differential expression of ICD genes among the RB immune subtypes.

Weighted gene co-expression network construction between invasive and non-invasive

To identify regulatory genes associated with the onset of RB tumor invasion, we analyzed data from invasive and non-invasive patient groups. Correlation networks were constructed to identify clusters of highly correlated genes in microarray samples. Using Weighted Gene Co-expression Network Analysis (WGCNA), we built and analyzed active correlation networks. After clustering the samples, an appropriate threshold (cut Height = 120) was set to remove obvious outliers (Fig. 7A). A soft threshold power (β = 10) was applied to create a neighbor-joining matrix, ensuring the gene distribution adhered to a scale-free network while retaining valuable connectivity information. With parameters minModuleSize = 30 and merge CutHeight = 0.25, a total of 10 modules were generated and identified (Fig. 7B). To explore the relationship between modules and phenotypes, we calculated the correlation coefficients of each module with invasive traits. The results revealed that one module (MEGreen) exhibited a statistically significant correlation with the invasive phenotype (Fig. 7C). We further screened the top10 most connected genes from MEgreen, and we found that DOK7 gene was significantly more highly expressed in the invasive group than in the non-invasive group (Fig. 7D). qPCR analysis revealed that DOK7 expression was significantly downregulated in the siRNA#1 group compared to the NC group, whereas no notable reduction was observed in siRNA#2 or siRNA#3 groups (Fig. 7E). Consequently, siRNA#1 was selected for DOK7 silencing. Functional assays showed that DOK7 knockdown reduced Y79 cell proliferation, as confirmed by CCK-8 and colony formation assays (Fig. 7F). It also inhibited migration, as shown by the scratch assay (Fig. 7G). Moreover, DOK7 knockdown inhibited migration in the scratch assay. It also reduced both migration and invasion in the Transwell assay (Fig. 7H, i). Taken together, these results indicate that DOK7 plays a critical role in promoting proliferation, migration, and invasion of Y79 cells. Targeting DOK7 may therefore represent a potential therapeutic strategy for inhibiting the invasive behavior of these cells.

Fig. 7
Fig. 7The alternative text for this image may have been generated using AI.
Full size image

The detection of module genes using weighted gene co-expression network analysis (WGCNA) in invasive and non-invasive RB. A Outlier samples were removed from the analysis of invasive versus non-invasive groups. B Clustering was performed using gene expression data, with color intensity representing disease status (invasive vs. non-invasive). C In the gene tree, different colors denote distinct gene co-expression modules. E A collinear heatmap of module feature genes is shown, where red indicates high correlation and blue represents low or inverse correlation. D DOK7 expression in Y79 cell e was quantitatively analyzed by qPCR. E Proliferation attenuation mediated by DOK7 knockdown was validated through CCK-8 viability assay (72-h time course). F Colony formation capacity assessment confirmed the anti-proliferative effect of DOK7 silencing . G Wound healing assay demonstrated impaired migratory potential in DOK7-deficient cells (0/48-h monitoring). H The Transwell migration chamber (upper) and Matrigel invasion assays (lower) jointly verified the suppressive effect of DOK7 silencing on both migratory and invasive capacities (24-h incubation).

Discussion

This study provides a detailed characterization of the tumor microenvironment (TME) in RB using single-cell RNA sequencing (scRNA-seq) to analyze the differences between invasive and non-invasive RB32,33,34. Our findings highlight the crucial role of the TME in determining the aggressiveness of invasive RB and suggest potential avenues for therapeutic targeting.

We identified significant differences in cellular composition between invasive and non-invasive RB. Invasive tumors exhibited a higher proportion of cone precursor cells, microglial cells, and MKI67 + cone precursors, which may contribute to tumor progression and immune modulation. These cell populations likely play key roles in shaping the TME, promoting an immunosuppressive environment that facilitates tumor invasion and metastasis. Understanding these cellular dynamics is critical for advancing our knowledge of RB pathogenesis and identifying novel therapeutic strategies.

Further analysis revealed that invasive RB is associated with a more immunosuppressive TME, marked by elevated levels of regulatory T cells (Tregs) and myeloid-derived suppressor cells (MDSCs). These immune cells contribute to immune evasion and tumor progression, which is consistent with the poor prognosis observed in invasive RB patients. Notably, macrophage subpopulations, such as macrophage-0, macrophage-2, and macrophage-3, were found to correlate with survival outcomes, further emphasizing the importance of immune cell composition in RB prognosis.

In addition, we explored cell–cell interactions within the TME using “CellphoneDB” and “NicheNet” analyses. Distinct differences were observed in the interaction patterns between cell types in invasive and non-invasive RB. For example, epithelial cells in invasive RB interacted more closely with myo-CAFs, a hallmark of aggressive tumors. These interactions are crucial for the progression of RB and underscore the importance of stromal-tumor crosstalk in invasive disease.

A potential limitation of our study is the possible confounding effect of tumor stage and burden. Invasive retinoblastomas are frequently associated with higher ICRB stage, optic nerve or choroidal invasion, and vitreous or subretinal seeding, all of which reflect greater tumor burden. Therefore, some of the observed TME differences—such as the enrichment of immunosuppressive cell populations and enhanced fibroblast–tumor interactions—may partly reflect disease stage rather than invasiveness itself. To address this concern, we performed subgroup analyses stratified by ICRB stage and seeding status, which showed that invasive tumors retained distinct TME features even after adjusting for stage.

Our study also identified two molecular subtypes of RB from gene expression profiles. Subtype 1, characterized by active cell cycle and DNA replication pathways, correlates with increased tumor aggressiveness. In contrast, subtype 2 showed upregulation of immune checkpoint genes and correlates with an immunosuppressive TME, potentially explaining its poorer response to conventional therapies. In conclusion, our comprehensive analysis highlights the pivotal role of the TME in RB invasiveness. The identification of key cellular and immune populations, as well as specific cell–cell interactions, provides crucial insights into the mechanisms underlying RB progression. These findings may guide the development of targeted therapies focused on modulating the TME to enhance patient outcomes, particularly for those with invasive RB.