Introduction

Gliomas represent the most common and lethal primary tumors of the central nervous system (CNS)1. These tumors exhibit remarkable molecular and clinical heterogeneity, making them challenging to treat effectively. The latest World Health Organization (WHO) 2021 classification has refined the molecular characterization of gliomas, classifying them into three main categories2,3. In adults, diffuse gliomas are categorized according to the status of isocitrate dehydrogenase (IDH) 1 and 2 genes, as well as the presence of a 1p/19q chromosomal co-deletion: 1) astrocytoma, IDH-mutant; 2) oligodendroglioma, IDH-mutant with 1p/19q co-deletion; and 3) glioblastoma, IDH wild-type. Tumor grading follows a nine-step protocol that evaluates tumor location, histology, and molecular markers4.

The integration of molecular targets into the classification of gliomas has significantly advanced our understanding of their pathogenesis3,5. This refined system facilitates more accurate diagnoses and tailored treatment strategies, which have notably improved survival rates, particularly for WHO grades 02 and 036. Nonetheless, recurrence remains a frequent outcome for these patients, and the prognosis for grade 04 gliomas has remained stagnant over the decades7. Therefore, there is a need for a deeper understanding of the molecular mechanisms underlying glioma development.

Transcriptional regulation plays a crucial role in the biology of gliomas810. Alterations in chromatin structure and epigenetic modifications can significantly affect tumor aggressiveness and phenotype, suggesting that disruptions in genetic regulation are central to tumor formation and progression1113. In this context, investigating gene regulatory networks (GRNs) is essential for identifying and characterizing transcription factors (TFs) along with their target genes1416. GRNs represent intricate regulatory interactions that control gene expression, dictating cellular fate and response to external signals. A core element of GRNs is the regulon, which refers to a transcription factor and the set of genes it directly regulates. These regulons reflect coordinated regulatory programs that can be inferred from co-expression and mutual information patterns in transcriptomic data.

In gliomas, the dysregulation of GRNs may contribute to tumor heterogeneity, therapy resistance, and progression. Computational inference of GRNs from transcriptomic data provides valuable insights into these regulatory interactions, enabling the identification of key regulators involved in glioma biology.

To reconstruct GRNs, we utilized the RTN package16, a framework specifically designed for regulatory network analysis based on mutual information and transcriptional regulons. RTN employs the ARACNe (Algorithm for the Reconstruction of Accurate Cellular Networks) algorithm to infer TF-target interactions17, followed by bootstrapping and statistical refinement to enhance robustness. This approach allows the identification of master regulators and their downstream targets, uncovering potential molecular drivers of glioma progression.

In light of the pressing need for deeper insights into glioma biology, this study seeks to employ an integrative systems approach to uncover potential mechanisms and molecular targets of prognostic significance. We analyzed two RNA-seq datasets comprising 989 samples from primary glioma patients. We reconstructed gene GRNs for both datasets and identified key prognostic genes, contributing to a deeper understanding of the pathology of these dynamic and challenging diseases.

Results

Reconstruction and activity analysis of gene regulatory networks in glioma

To address the inherent diversity and complexity of gliomas, we reconstructed their GRNs, grouping genes into regulatory units centered around TFs. Using the RTN package, we identified regulons to investigate regulatory mechanisms driving glioma biology.

We utilized two publicly available datasets: The Cancer Genome Atlas (TCGA) and the Chinese Glioma Genome Atlas (CGGA)18. The TCGA dataset comprised samples reclassified according to the 2021 WHO glioma classification, whereas the CGGA dataset followed the 2016 WHO classification due to the lack of updated annotations.

Following GRN reconstruction, we evaluated regulon activity using two-tailed Gene Set Enrichment Analysis (GSEA), enabling the assessment of regulatory directionality in both TCGA and CGGA datasets (Fig. 1a). This analysis facilitated the assignment of regulon activity scores to individual samples, providing a quantitative measure of their functional roles in glioma progression.

Fig. 1
figure 1

Least absolute shrinkage and selection operator (LASSO) analysis of regulons from TCGA and CGGA: (a) Schematic workflow. (b) Cross-validation results demonstrating the performance of LASSO for variable coefficient selection. The plot illustrates the partial likelihood deviance across different values of the regularization parameter (lambda), aiding in the selection of the optimal lambda. (c, d) LASSO coefficients for the identified regulons in the TCGA and CGGA dataset respectively. Each dot represents the magnitude and direction of the effect of each variable on the model. (e) Hierarchical tree-and-leaf representation of the identified regulons; the size of each circle corresponds to the number of genes within each regulon. Green circles indicate regulons derived from TCGA, while orange circles represent those from CGGA. (e) Heatmaps displaying the activity levels of the identified regulons across all samples from CGGA and TCGA.

Survival-associated regulons and their functional roles in glioma

To identify potential regulons associated with survival in glioma samples, we applied the Least Absolute Shrinkage and Selection Operator (LASSO) method in conjunction with Cox regression, using age and tumor grade as covariates (Fig. 1b; Supplementary Fig. 1a and 1b; Supplementary Table 1). This analysis revealed 22 regulons (FOXM1, DMRTA2, GPBP1L1, GSX2, ZNF675, SOX10, SATB2, SHOX2, POU4F1, NHLH2, NEUROG1, ZNF501, ZIM2, EBF3, ZNF474, ZNF607, SPZ1, TP63, ZNF683, FOXG1, GATA5, and HNF4A) with non-zero coefficients in the CGGA (Fig. 1c) dataset and 28 (ATF6B, DBP, DMRTA2, GATA4, GLI4, GRHL3, HOXA2, IRX5, KIN, MEIS1, NEUROG3, NFX1, OTP, POU3F1, SETBP1, SNAI3, SOX10, TBPL1, THAP1, TRPS1, XPA, ZNF20, ZNF333, ZNF423, ZNF736, ZNF845, ZNF91 and ZSCAN32.) in the TCGA dataset (Fig. 1d), suggesting their potential prognostic relevance.

Further analysis using RTN revealed that several regulons displayed similarities in network clusterization, as shown in the hierarchical tree and leaf network representation. Despite minimal overlap between the regulons identified in the two datasets, with only SOX10 common to both (Fig. 1e), the tree-and-leaf representation highlighted shared network similarities across datasets. Interestingly, these shared clusters corresponded to distinct transcription factors (TFs), underscoring potential functional convergence among different regulons.

Hierarchical cluster analysis of regulon activity levels in the TCGA and CGGA datasets, arranged from top to bottom, distinctly identified a cluster corresponding to WHO grade 04 samples, alongside other samples associated with poor overall survival (Fig. 1f). Despite differences in the identified regulons, we focused on the genes associated with each regulon, as their analysis provided deeper insights into the underlying regulatory networks while mitigating batch effects.

Using gene lists extracted from the RTN results, we visualized the associated gene networks, identifying 1700 genes across the 28 TCGA regulons and 1083 genes across the 22 CGGA regulons (Supplementary Figure 1c). While the regulons differed between datasets, enrichment analysis of the associated gene sets revealed shared biological processes (BPs) across regulons in both datasets. For instance, the DMRTA2 and SHOX2 regulons in the CGGA shared biological processes with the IRX5 regulon in the TCGA dataset, primarily related to synaptic signaling and cell proliferation (Fig. 2a; Supplementary Table 2).

Fig. 2
figure 2

Characterization of genes within the regulons: (a) Functional enrichment analysis of all identified regulons from CGGA and TCGA, highlighting the biological processes significantly associated with the gene sets. (b) Venn diagram illustrating the overlap of genes within prognostically relevant regulons from CGGA and TCGA. The diagram indicates the number of unique and shared genes. (c, d) Cross-validation results for Elastic Net variable coefficient screening, demonstrating the stability and performance of selected variables across different parameter settings. The plots shows partial likelihood deviance as a function of the regularization parameters, aiding in the identification of the optimal model, significant variables coefficient values and elastic net coefficients for the identified regulons in the CGGA and TCGA dataset respectively. Each dot represents the magnitude and direction of the effect of each variable on the model.

Additionally, individual regulons specific to either TCGA or CGGA highlighted key aspects of glioma biology, including immune infiltration, apoptosis, and protein metabolism. These findings underscore the ability of regulons to capture critical prognostic processes, as previously described1923.

To further evaluate the relationship between these genes and survival outcomes, we focused on the 162 genes common to both datasets (Fig. 2b). Enrichment analysis revealed that these genes were significantly associated with pathways related to synaptic signaling, embryonic development, and cell division (Supplementary Figure 2a: Supplementary Table 3). These findings strengthen the hypothesis that synaptic integration plays a pivotal role in glioma development and underscore the potential involvement of glioma stem cells in these regulatory networks2426.

Prognostic significance of common genes identified across TCGA and CGGA

To identify potential prognostic variables, we performed elastic net regularization combined with Cox regression analysis, focusing on the 162 genes common to both the TCGA and CGGA datasets. This integrative approach yielded 31 genes from the TCGA dataset and 32 genes from the CGGA dataset, each with significant positive or negative coefficient values (Fig. 2c,d; Supplementary Table 4).

In the CGGA dataset, genes with positive and negative coefficients included GAS2L3, HOXA3, OTP, TXLNB, HOXD13, IRX1, KIF14, SLC2A10, SSH3, MOXD1, SOX10, HDAC4, MADD, GFRA1, ALDH5A1, CSTB, WBP1L, GABRB3, GBF1, COPZ2, CRY2, CRTAC1, SLC1A4, GNL1, ARPP21, KCNJ11, SLITRK5, RNF157-AS1, GNAL, BICC1, LRRTM4, and LCE1C (Fig. 2c). Similarly, in the TCGA dataset, genes with positive and negative coefficients included PLEK2, ISG20, GAS2L3, OTP, TXLNB, ATP10B, HOXD13, IRX1, EFEMP2, TKTL1, EMP3, ERBB3, PLAT, TNR, NSG2, GABRB3, NRG3, DGCR2, LUZP2, CRY2, MRPS16, CRTAC1, GNL1, TEF, RNF180, SLITRK5, CTBP2, GRID1, GRHPR, LCE1C, and CTB-1I21.1 (Fig. 2d). Among these genes, 11 were common to both datasets: CRTAC1, GABRB3, GNL1, CRY2, SLITRK5, LCE1C, GAS2L3, HOXD13, TXLNB, OTP, and IRX1 (Fig. 3a).

Fig. 3
figure 3

Common prognostic genes. (a) Upset plot illustrating the overlap of genes identified by elastic net in the TCGA and CGGA datasets. (b) Hazard ratios for the eleven common prognostic genes, adjusted for tumor grade and patient age. The plot presents the hazard ratios with corresponding confidence intervals. (c, d) Survival curves of CGGA and TCGA respectively, stratified by the maximization of the log-rank test statistic, demonstrating the differential survival probabilities of patients with high versus low expression levels. The curves are accompanied by log-rank test p-values to assess the statistical significance of the observed differences. Benjamini–Hochberg (BH) was used to adjust for multiple hypothesis.

To further explore the prognostic significance of the 11 candidate genes, we conducted individual evaluations using a Cox proportional hazards model, adjusting for tumor grade and age as covariates. Of these genes, GAS2L3, TXLNB, HOXD13, OTP, and IRX1 had positive coefficients in both datasets, while LCE1C, CRTAC1, GNL1, SLITRK5, and CRY2 showed negative coefficients across both datasets (Fig. 3b).

Among these genes, GAS2L3, HOXD13, and OTP emerged as the only ones with significant hazard ratios consistently observed in both datasets. Stratifying samples based on high and low expression of each gene revealed that these three genes effectively distinguished survival outcomes across different tumor grades and datasets (Fig. 3c,d).

Survival analysis revealed significant differences between high- and low-expression groups, with these analyses performed for the 11 genes described above (Supplementary Fig. 3 and Supplementary Table 5). In the CGGA dataset, GAS2L3 had a pronounced impact on survival probabilities across WHO grades II, III, and IV, with p-values of 0.00039, < 0.0001, and 0.00061, respectively. Similarly, in the TCGA dataset, GAS2L3 demonstrated significant associations with survival outcomes for grades 03 and 04 (p-values of 0.00453 and 0.00283, respectively), while no significant effect was observed for grade 02.

For the HOXD13 gene, survival analysis revealed significance for grades II, III and IV as well in the CGGA dataset (p = 0.00827; p < 0.0001 and 0.00635 respectively), and in the TCGA dataset, only for grades 03 and 04 ( p = 0.00064 and 0.00478 respectively). The OTP gene was significant in grades II and IV in the CGGA dataset (p = 0.00039 and p = 0.0052, respectively), whereas in the TCGA dataset, significance was observed only for grade 04 (p = 0.0348 ).

Functional characterization of prognostic genes in glioma subpopulations

To explore the potential functions and specificities of the targeted genes, we utilized a single-cell RNA sequencing (scRNA-seq) dataset compiled by Abdelfattah et al.27, which includes 201,986 cells from patients with primary and recurrent glioblastoma, as well as grade 02 astrocytoma and oligodendrogliomas, following the workflow illustrated in Fig. 4a. The analysis revealed eight distinct cell clusters: B cells, T cells, endothelial cells (Endo), glioma cells, myeloid cells, oligodendrocytes (Oligo), pericytes, and other cells, as previously determined by Abdelfattah et al. (Fig. 4b).

Fig. 4
figure 4

Gene expression at the single-cell level. (a) Schematic representation of the analysis workflow used to assess gene expression at the single-cell level. (b) UMAP plot displaying the distribution of all 201,989 single cells, with cells color-coded according to their respective labels. (c) Feature plots illustrating the expression levels of the eleven prognostic genes across all cells. Each plot showcases the spatial distribution of gene expression within the cell population. (d) Dotplot showcasing average expression (color gradient) and percentage of cells expressing (dot size) of genes across cell populations.

In addition, we explored the expression of eleven genes associated with patient’s overall survival (OTP, GAS2L3, TXLNB, HOXD13, IRX1, GABRB3, LCE1C, CRTAC1, GNL1, SLITRK5, and CRY2) across cell clusters. Most of these genes showed high expression levels in cluster enriched in glioma cells (GCs), indicating their potential roles in tumor biology. However, GAS2L3, CRY2, and GNL1 exhibited expression across both tumor and non-tumor cells (Fig. 4c,d). To further investigate the roles of these genes within glioma cells, we subsetted the data and re-clustered it (Fig. 5a; Supplementary Fig. 4a and b).

Fig. 5
figure 5

Glioma subcluster exploration. (a) UMAP plot illustrating glioma subclusters identified through clustering analysis. Seven distinct GCs are specified. (b) UMAP plot depicting the cell cycle phases of the cells included in the analysis. Cells are color-coded according to their respective cell cycle phases (G1, S, G2/M). (c) Heatmap displaying the top 5 marker genes for each GC. (d) UMAP plot showcasing the module scores of the four glioma cell states (AC, MES, NPC, OPC). (e) Quadrant plot illustrating the distribution of each GCs across four quadrants, each representing one of the glioma cell states.

Based on this analysis, we identified seven distinct GCs with varying proportions across the tumor types (Supplementary Fig. 4c,d). Notably, GC03 and GC04 were characterized by the presence of dividing cells, indicating active cell proliferation (Fig. 5b). The remaining clusters were further characterized based on their top marker genes, with unique expression profiles visualized in the heatmap (Fig. 5c; Supplementary Table 5).

Evidently, GC03 and GC04 exhibited high expression of proliferative markers, including TOP2A, NUSAP1, BIRC5, PBK, UBE2C, PCLAF, TYMS, CLSPN, MAD2L1, and CENPK, indicating a highly proliferative profile. GC00 was characterized by markers associated with an oligodendrocytic phenotype (OLIG1 and OLIG2), while GC02 displayed a mesenchymal profile, evidenced by the expression of NDRG1, VEGFA, AKAP12, and ADM. GC01 showed specific astrocytic markers, namely AQP4 and GJA1. GC05, although not strongly expressing its top markers, exhibited immune-related characteristics, highlighted by the presence of CG2, CD3D, CD3E, GZMH, and GZMK. Finally, GC06 was defined by markers associated with a neuroprogenitor state, including RND3, ELAVL4, STMN2, CD24, and INSM1.

Additionally, we classified these clusters using the functional modules described by Neftel et al28. With this, we labeled GCs as oligodendrocyte precursor cell (OPC), astrocytic (AC), mesenchymal (MES), and neural precursor cell (NPC) (Fig. 5d). Using the module scores from these four profiles we generated coordinates for each cell score and plotted these in a quadrant plot, for visualization of each cluster’s profile (Fig. 5e). This approach allowed us to see how each cell cluster aligns with different cellular profiles (OPC, AC, MES, and NPC).

Despite some overlap, GC00 distinctly segregated into the OPC quadrant. In contrast, GC01 was more associated with the AC quadrant, and GC02 was situated in the MES quadrant. Furthermore, GC03 and GC04 lacked a specific signature, which was anticipated as they comprised cells in both G2 and S phases. The final cluster, GC06, was well characterized as an NPC cluster, incorporating features from all four cellular profiles (Fig. 5d,e).

We further explored the expression of key genes within the glioma cluster with UMAP visualizations. This method helps to understand which genes are expressed in different cell clusters and their association with specific cellular profiles. We focused on the expression of eleven genes (OTP, GAS2L3, TXLNB, HOXD13, IRX1, GABRB3, LCE1C, CRTAC1, GNL1, SLITRK5, and CRY2) within the glioma cluster (Fig. 6a,b). The OTP expression was predominantly observed in GC06, associated with an NPC profile. In contrast, HOXD13 did not exhibit a specific cluster or profile. GAS2L3 was broadly expressed, with notable concentration in GC03, which consists of cells in the G2 phase without a defined profile. Interestingly, OTP, GAS2L3, TXLNB, and HOXD13 showed low expression in the core of the OPC cluster. Conversely, GABRB3, CRTAC1, SLITRK5, and CRY2 displayed preferential expression in the OPC cluster.

Fig. 6
figure 6

Gene expression across glioma subclusters (GCs). (a) Feature plot displaying the expression levels of eleven selected genes within the glioma subset. Each cell is color-coded according to the expression intensity of the respective gene. (b) Dotplot of the mean and percent expression from the eleven genes across GCs.

Discussion

Our study provides critical insights into the mechanisms driving survival in primary glioma patients, highlighting specific prognostic genes that may serve as valuable references for therapeutic strategy development, such as GAS2L3, HOXD13 and OTP. By analyzing data from TCGA and CGGA, we identified key BPs (e.g. synaptic integration and cell differentiation) likely involved in glioma progression and recurrence, focusing on the relationships between genes, regulons, and biological pathways. This work not only enhances our understanding of glioma but also appoints to a specific group of genes (OTP, GAS2L3, TXLNB, HOXD13, IRX1, GABRB3, LCE1C, CRTAC1, GNL1, SLITRK5, and CRY2) that may impact patients’ survival. These findings underscore the potential for targeted interventions that modulate cellular plasticity, paving the way for improving therapeutic outcomes.

Gliomas are known for their high heterogeneity, both within and between histological grades and subtypes2933. To address this complexity, we reconstructed the GRN using two distinct datasets, enabling us to account for the variability inherent to these tumors34. Although the regulons themselves did not show direct overlap, we observed a remarkable degree of similarity in the genes of these regulons and the BPs associated with these genes.

The discrepancy between the TCGA and CGGA may be attributed to a potential batch effect, which could introduce technical and biological variability, or it may also arise from populational backgrounds, requiring future investigation. Meanwhile the overlap of the genes within the regulons allowed us to identify the existence of underlying processes that are consistent between the datasets. It is worth noting that the difference in regulons may reflect the complexity of regulatory mechanisms in glioma, while the similarity in regulated genes highlights that networks can achieve the same biological results through different pathways, reinforcing the importance of central gene interactions over specific regulatory differences35.

Reconstructing GRNs from RNA-seq data sourced from TCGA and CGGA enabled us to identify a shared set of 162 genes across both datasets. These genes are involved in critical BPs, including synaptic function, cell cycle regulation, and developmental pathways. These findings underscore the parallels between glioma biology and normal neural embryonic development3638, where synaptic communication plays a pivotal role in modulating cellular plasticity39,40. The disruption of early synaptic formation may trigger cascading effects on downstream pathways24,4144, highlighting the central role of synaptic integration in both tumor initiation and progression.

Our detailed analysis of these 162 genes, using elastic net regularization and Cox regression, identified 31 genes associated with survival outcomes in TCGA and 32 in CGGA, with 11 genes common to both analyses. Genes linked to better prognoses, CRTAC1, GABRB3, GNL1, CRY2, and SLITRK5 stand out, whereas GAS2L3, HOXD13, TXLNB, OTP, and IRX1 are associated with poorer survival. From these genes, OTP, HOXD13, IRX1, CRY2, GNL1, SLITRK5, GABRB3 CRTCA1, LCE1C and GAS2L3 have been previously described as potential prognostic or participating in glioma biology4556, while TXLNB has never been reported in this context. Notably, the shared presence of IRX1, OTP, and HOXD13 within the latter group underscores the critical role of transcriptional regulation in maintaining cellular plasticity47,50,57,58. However, further studies are necessary to elucidate the functional relevance of these candidate genes through validation in independent cohorts and experimental models.

GAS2L3, a key cytoskeletal regulator, plays a pivotal role in microtubule and actin filament crosslinking, which is essential for processes such as chromosome segregation and cell division59,60. Dysregulation of GAS2L3 can disrupt these mechanisms, potentially leading to genomic instability and enhancing the invasive characteristics of glioma cells61,62. Similarly, TXLNB, may be involved in vesicle transport and nerve regeneration, by promoting syntaxin binding, underscoring the interconnected roles of intracellular trafficking and cytoskeletal dynamics63. Together, these genes together, suggest a critical interplay between vesicle transport, and cytoskeletal dynamics in promoting glioma cell adaptability and aggressiveness.

Furthermore, our individual gene analyses, adjusted for age and tumor grade, indicated that HOXD13, GAS2L3, and OTP significantly stratifying survival groups, emphasizing their previously reported relevance as prognostic molecules45,48,49,51. These findings advocate that the expression levels of these genes are strongly associated with glioma aggressiveness and patient prognosis.

To assess gene expression heterogeneity within tumor cells, we examined it at the single-cell level. Our results revealed that while most of these genes were expressed within glioma cells, GAS2L3, GNL1, and CRY2 also exhibit expression in immune cells. These results highlight the potential interactions between glioma and the immune microenvironment, offering insights into the crosstalk that may influence tumor progression27,6471.

The identification of seven distinct GCs at the single-cell level further elucidates the complexity of glioma biology. Using the cellular state framework established by Neftel et al.28 we accurately characterized these clusters based on astrocytic, mesenchymal, neuroprogenitor, and oligoprogenitor states. Notably, we observed a correlation between the hazard ratios of the identified genes and their expression within the OPC cluster. Genes with hazard ratios indicating a protective effect were predominantly expressed in this cluster, suggesting that the OPC state may confer a more favorable prognosis, potentially due to its association with less aggressive tumor phenotypes7275.

In conclusion, our findings highlight the key roles of specific genes and processes in glioma biology, providing a foundation for future research into their regulatory mechanisms. We identify intricate relationships between regulons, genes, and biological processes that shape cell profiles, such as NPCs, MES, ACs, and OPCs (Fig. 7). Ten of the eleven genes studied have been previously linked to gliomas, with TXLNB being novel in this context. These genes shed light on crucial aspects of glioma malignancy, including synaptic integration, differentiation, and proliferation. Remarkably, the correlation between gene expression in the OPC cluster and survival outcomes emphasizes the potential of targeting these pathways to improve therapeutic strategies and patient prognosis.

Fig. 7
figure 7

Proposed mechanisms linking prognostic genes to glioma biology: Schematic processes that each prognostic gene is involved in the glioma biology.

Methods

Transcriptome datasets

Transcriptional data from 989 primary gliomas were obtained from TCGA (https://tcga-data.nci.nih.gov/tcga/) and the CGGA (http://www.cgga.org.cn/)18. From the TCGA, we selected the Lower Grade Glioma and Glioblastoma (GBMLGG) project, which comprises 661 primary tumor samples, from wich 567 had the reclassification made available by Zakharova et al. This project gathers data from both the Glioblastoma and Lower Grade Glioma studies. The samples obtained from tumor biopsies are categorized into grades 02, 03, and 04 based on the 2021 WHO classification criteria76. Sequencing was performed using the Illumina HiSeq platform and clinical data was retrieved for each sample.

In the CGGA dataset, we specifically selected the "mRNAseq_693 (batch 1)" dataset, which encompasses 693 tumor biopsy samples of gliomas18,77. To align with the restriction of primary tumors in TCGA, we applied a filter to the mRNAseq_693 dataset, resulting in 422 primary tumor samples spanning grades II, III, and IV. These samples were classified according to the WHO 2016 classification criteria and were also sequenced using the Illumina HiSeq platform.

Reconstruction of gene regulatory networks

To assess gene transcriptional regulatory networks, we utilized the RTN package from Bioconductor, R version 4.3.2 and RStudio78,79. First, we normalized the data to counts per million (CPM) using the EdgeR package8082. Gene regulatory networks were constructed based on the normalized expression data from both TCGA and CGGA datasets individually. This process yielded a network of regulons, where each regulon consists of a TF and the genes it regulates, with the names of the regulons corresponding to those of the TFs. Following the reconstruction, we assessed regulon activity using the RTN package, performing a Two-Tailed GSEA2 for each regulon to provide insights into the expression of genes within the regulon for individual samples. Regulons network was visualized with RedeR package and support of igraph83,84.

Identification of prognostic regulons via LASSO regularization

To identify significant predictors associated with survival outcomes, we employed a LASSO Cox regression model using regulon activity. The glmnet package in R facilitated fitting the LASSO Cox regression model with cross-validation (cv.glmnet)85,86. The family argument was specified as “cox” to indicate Cox regression. Cross-validation was conducted to select the optimal value of the penalty parameter (lambda) for the LASSO regression model. The cv.glmnet function determined the optimal lambda (lambda_optimal) through cross-validation, identifying the model with the minimum error. We extracted the coefficients of the model corresponding to the optimal lambda. These coefficients represent the strength and direction of the association between predictor variables and survival outcomes. The final regression model was fitted using the optimal lambda (lambda_optimal), utilizing the glmnet function to specify the penalty parameter derived from the cross-validation process. Visualization was performed using ggplot2 package.

Functional enrichment analysis

To visualize the genes in the selected regulons, we conducted functional enrichment analysis focusing on common gene clusters in the R environment using the “ClusterProfiler” package87. We identified and filtered enriched terms in the Gene Ontology database, separating analyses by database and regulon group, applying a statistical criterion of adjusted p-value < 0.05. The results were visually presented through dot plots, offering an intuitive graphical representation of the enriched BPs associated with the analyzed gene groups.

Prognostic gene detection via elastic net regularization

We employed Elastic Net regression, a hybrid of LASSO and Ridge regression techniques, to identify significant predictors associated with glioma patient survival. The glmnet package in R facilitated the fitting of the Elastic Net regression model with cross-validation, utilizing the cv.glmnet function85,86. This approach allowed for simultaneous variable selection and regularization, enabling the identification of relevant features while mitigating potential issues such as multicollinearity. The optimal values of the regularization parameters (alpha and lambda) were determined through cross-validation to balance model complexity and predictive performance. The resulting coefficients from the Elastic Net regression model were interpreted to discern the strength and direction of the association between predictor variables and survival outcomes.

Evaluating gene prognostic values

We applied a proportional hazards regression model to associate gene expression with survival outcomes and time. The R package ‘survival’ was utilized, specifically the coxph function, which fits the regression model to the survival object88. This model allows for the determination of the influence of variables on survival outcomes over time, providing hazard ratio values with confidence intervals (CIs) for each variable. The survminer package was used to plot Kaplan Meyer curves, cutoff points were determined by surv_cutpoint() function89,90. Benjamini–Hochberg (BH) was used to adjust for multiple hypothesis.

Single-cell RNA-seq analysis

Data for single-cell analysis was obtained from the Broad Institute Single Cell Platform, specifically from the study by Abdelfattah et al27. The dataset comprises 201,986 cells, and we utilized the Seurat package for basic preprocessing, following the analytic pipeline described by Abdelfattah et al. Cells with mitochondrial content exceeding 20% and ribosomal content exceeding 50% were excluded. Harmony software was employed to correct batch effects related to patient and sex using the runHarmony() function91.

We utilized the classifications provided by Abdelfattah et al. to subset only glioma cells. Clustering and normalization were conducted on the glioma subset, clustree was employed, and clusters with less than 1000 cells and no spatial distinction were excluded92. The cell profiles were evaluated using AddModuleScore() from Seurat, and for visualization of each cluster’s profile, the scores were transformed into coordinates and plotted in a quadrant plot using ggplot2.