Introduction

Cancer is a highly heterogeneous disease driven by genetic, epigenetic, and transcriptomic alterations. Advances in high-throughput sequencing technologies have enabled the generation of multi-omics datasets, offering deeper insights into the mechanisms underlying cancer development and patient outcomes. Among these, gene expression profiling plays a central role, allowing to monitor gene activity within specific tissues and cell populations and to distinguish cancerous from healthy cells1. Messenger RNA (mRNA) levels reflect active gene transcription under specific conditions, providing valuable information on tumor progression and cellular behavior2, as genes often display altered expression patterns in tumors compared to healthy tissues, revealing key molecular changes associated with cancer3. Analyzing such patterns aids in identifying cancer-specific genes and discovering potential biomarkers for early detection. The integration of gene expression data with DNA methylation and microRNA (miRNA) expression profiles has further advanced cancer research4. Combining molecular layers uncovers complex regulatory interactions that contribute to tumorigenesis5. DNA methylation profiling highlights epigenetic modifications that can silence tumor suppressor genes or activate oncogenes6, while miRNA expression analysis reveals critical mechanisms of post-transcriptional gene regulation involved in cancer progression7.

Despite the wealth of information provided by multi-omics data, extracting meaningful insights remains a major challenge due to the high dimensionality, feature heterogeneity, and complexity of genomic structures8,9. Traditional machine learning approaches, such as Support Vector Machines (SVMs) and Random Forests (RF), have shown potential for multi-omics-based cancer classification but often struggle with modeling the complex relationships in high-dimensional datasets and providing interpretable results10,11.

Recent advances in deep learning, particularly Graph Neural Networks (GNNs), have demonstrated strong capabilities in capturing complex biological interactions12,13. Unlike conventional models that rely on Euclidean-based representations, GNNs naturally encode the relationships among genes, proteins, and regulatory elements within a graph structure, offering a biologically meaningful approach14,15. The Graph Kolmogorov-Arnold Network (GKAN) represents a significant advancement, whereas by applying Kolmogorov-Arnold representation theory to graph learning, GKAN enhances both model interpretability and flexibility through the use of trainable univariate functions on graph edges16,17. Furthermore, the incorporation of spline-based transformations allows for precise feature extraction and greater transparency, making GKAN particularly well-suited for biomarker discovery in cancer diagnosis.

This article introduces the Multi-Omics Graph Kolmogorov–Arnold Network (MOGKAN), a deep learning framework that integrates graph-based modeling of mRNA, miRNA, and DNA methylation data to classify 31 distinct cancer types. Protein-Protein Interaction (PPI) network information is used for defining the graph structure of MOGKAN. The data preprocessing pipeline combines differential expression analysis, Linear Models for Microarray Analysis (LIMMA)18, and LASSO regression to extract the most informative multi-omics features. DESeq219 was applied to mRNA data to identify genes exhibiting significant changes in expression levels. For DNA methylation data, LIMMA employs empirical Bayes methods to stabilize variance estimates and improve the detection of differential signals, particularly for genes with low expression levels. This approach enables the identification of differentially methylated regions with high sensitivity and specificity, providing a robust foundation for epigenetic research and the discovery of novel cancer biomarkers.

The primary contributions of this work are as follows:

  • Proposed MOGKAN, a novel deep learning framework for cancer classification with inherent feature interpretability through learnable activation functions.

  • Constructed a graph-based model integrating a PPI network graph structure with multi-omics data from mRNA, miRNA, and DNA methylation profiles. The combined use of DESeq2, LIMMA, and LASSO enabled the selection of biologically relevant features critical for cancer classification.

  • Identified key biomarkers driving cancer progression and validated their functional relevance through Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses.

The rest of this paper is structured as follows. Section 2 discusses related work on graph-based network architectures and Kolmogorov–Arnold networks. Section 3 describes the datasets, preprocessing pipeline, multi-omics data integration, GKAN architecture, and experimental setup. Section 4 presents experimental results and analysis, and biomarker discovery. Section 5 concludes the paper.

Related works

Graph Neural Networks (GNNs) have demonstrated significant success in modeling structured data derived from graph-based relationships. While traditional GNNs offer strong predictive performance, they often face two major limitations, related to scalability and interpretability issues. To address these challenges, recent works explored Kolmogorov–Arnold Networks (KAN) as an alternative architecture. GKAN extends this concept by integrating KAN into graph learning tasks, replacing conventional linear weights with trainable univariate functions, thereby enhancing both model interpretability and flexibility.

Recent studies have increasingly adopted GKAN to improve feature representation and model interpretability in graph-based learning tasks. Zhang et al.16 introduced GraphKAN, replacing standard activation functions with KAN-based structures to enhance feature extraction, demonstrating superior performance in both node and graph classification tasks compared to traditional GNN architectures. Similarly, Kiamari et al.20 incorporated spline-based activation functions between graph layers, achieving high performance across a variety of graph network types. Kolmogorov–Arnold Graph Neural Networks (KAGNNs) proposed by Bresson et al.21 extended message-passing operations using principles from the Kolmogorov–Arnold theorem to improve graph learning. Carlo et al.22 further refined GKAN by applying spline-based activation functions directly to graph edges, boosting both predictive accuracy and interpretability.

Beyond methodological improvements, several studies have successfully employed GKAN-based architectures in molecular and biomedical domains. Ahmed et al.23 demonstrated that GKANs can accurately predict small molecule–protein interactions, highlighting their potential in drug discovery. Li et al.24 developed GNN-SKAN, an architecture combining Swallow-KAN (SKAN) with basic GNNs, achieving state-of-the-art results across multiple molecular datasets. The increasing complexity of biomedical data in multi-omics cancer classification presents a compelling opportunity for GKAN to advance cancer-type prediction as a rapidly evolving research area. GKAN networks are particularly well-suited for applications that demand transparent, interpretable models, as they provide explicit insights into prediction decisions.

Recent approaches such as scRGCL25 and scMGATGRN26 combined GNNs with contrastive learning and multiview mechanisms for improved interpretability and learning high-order biological relationships in single-cell transcriptomic data. Furthermore, scAMZI27 presented attention-based autoencoders as an additional scRNA-seq clustering method, and iCRBP-LKHA28 used hybrid attention and deep convolutional kernels to predict circRNA-RBP interactions. While these models are employed for cell-type annotation or molecular interaction predictions, MOGKAN is scaled to multi-omics integration and cancer-type classification and enhances both performance and interpretability.

Materials and methods

Dataset

This study utilizes data from the Pan-Cancer Atlas database29, which comprises genomic, transcriptomic, and epigenomic information across a wide range of cancer types. Access to this comprehensive database is facilitated by Genomic Data Commons (GDC), which provides streamlined data retrieval through query tools, such as the TCGAbiolinks package29. Developed by the National Cancer Institute, GDC serves as a standardized platform that promotes collaborative cancer research by enabling consistent and cohesive data sharing. For the analysis in this work, the extracted data includes 9,171 DNA methylation samples, 10,668 mRNA expression samples, and 10,465 miRNA expression samples. The number of omics samples for 33 types of cancer and normal tissue is detailed in Table 129. The compiled multi-omics data resource from the Pan-Cancer Atlas database enables the investigation of complex biological relationships and supports identification of biomarkers for studying tumor biology and clinical outcomes.

Table 1 Number of samples per cancer type and normal tissues of TCGA multi-omics data (mRNA, mirna, and DNA methylation) used in this study.

Data preprocessing

The data preprocessing pipeline in our work integrates dimensionality reduction techniques and feature selection methods to handle high-dimensional data and identify biologically relevant features in omics data. Specifically, we employed LIMMA and differential gene expression (DGE) analysis for feature selection. DGE analysis based on DESeq2 was applied to mRNA expression data, employing a negative binomial model to detect genes with significant expression changes30,31. LIMMA was used to analyze DNA methylation data and identify differentially methylated CpG sites31. To further reduce data dimensionality, we applied LASSO regression to mRNA and DNA methylation data32. The flow chart for data preprocessing is depicted in Fig. 1, with the following sections outlining the phases in the data processing workflow.

Fig. 1
figure 1

Flow chart for data preprocessing.

Differential gene expression (DGE) analysis

In genomics, DGE profiling is frequently used to compare the expression levels of genes in a particular organism under various settings or conditions (e.g., normal versus cancer, treatment against control, etc.)33. The analysis helps elucidate gene regulation mechanisms, environmental influences on gene activity, and a variety of other underlying biological processes. In our investigation, we employed DESeq2 to perform differential gene expression analysis on the mRNA data. DESeq2 models gene-level count data using a negative binomial distribution, which effectively accounts for both biological variability and overdispersion. We assessed the statistical significance of gene expression changes using the Wald test, based on p-values derived from the Wald statistic to evaluate whether the estimated log fold changes are significant. To identify genes potentially relevant to the biological processes under study, we applied a p-value threshold of 0.001.

LIMMA

For differential methylation analysis, we applied the LIMMA technique by fitting a linear model to the methylation levels of CpG sites as a function of experimental sample groups34. The initial dataset derived from the Human Methylation 450 K (HM450) array included 485,577 features across 9,171 samples35 (Table 2). Using LIMMA, we identified CpG sites that are significantly differentially methylated in tumor samples compared to normal controls. For each CpG site, LIMMA computes a moderated t-statistic and an effect size that captures the relative methylation differences between groups. The corresponding p-value indicates the statistical significance of each comparison. After applying a p-value cutoff of 0.05, the number of CpG features was reduced to 139,321, representing the most notable methylation alterations associated with the disease state.

LASSO Regression

Lasso Regression is a linear regression technique that incorporates L1 norm regularization to enhance model performance and reduce overfitting. The algorithm minimizes the sum of squared residuals while imposing penalties proportional to the absolute values of model coefficients. The enforcement of such penalty encourages sparsity by shrinking some coefficients to zero, effectively performing feature selection by eliminating less important variables. The Lasso objective function is given by the following equation:

$$\:\underset{\beta\:}{\text{min}}\sum\:_{i=1}^{n}{\left({\gamma\:}_{i}-\sum\:_{j=1}^{p}{\chi\:}_{ij}{\beta\:}_{j}\right)}^{2}+\lambda\:\sum\:_{j=1}^{p}\left|{\beta\:}_{j}\right|\:\:$$
(1)

where \(\:{\gamma\:}_{i}\)​ is the observed response variable for the \(\:i\)th sample, \(\:{\chi\:}_{ij}\:\)denotes the feature values, \(\:{\beta\:}_{j}\)​ are the regression coefficients, \(\:\lambda\:\:\)is the regularization parameter that controls the degree of sparsity, and \(\:n\) is the number of samples.

Multi-Omics data integration

To integrate mRNA (RNA-Seq), miRNA, and DNA methylation data into unified records, we used sample IDs as the linkage element. An inner join operation was performed on the common sample IDs across the three omics datasets, retaining only those samples that have complete data for all modalities. Cancer types lacking any of the omics layers were excluded from further analysis. Notably, two cancer types LAML and GCT (Table 1) were excluded from further analysis due to missing RNA-Seq and miRNA data, respectively. The final integrated dataset contains 8,464 samples spanning 31 cancer types and corresponding normal tissues, encompassing a total of 2,794 omics features (as summarized in Table 2).

Table 2 Pipeline for data processing.

Beside the used early integration strategy where multi-omics modalities are concatenated before passing them to a graph-based model, other integration strategies have been applied in prior works. Picard et al.36 described multi-omics integration under five integration types: early, mixed, intermediate, late, and hierarchical integration. These strategies have trade-offs in relation to the complexity of the model and pertaining to the interpretability and capability of preserving modality-specific signals. Although simple and popular, early integration methods might not be ideal in dealing with heterogeneity and differences in feature dimensionalities between layers of omics data. Conversely, mixed or intermediate integration have the ability to maintain modality-specific structures, as well as to enable more flexible modeling pipelines. Alternative strategies, like late or hierarchical integration, where each type of omics has its own encoder and they are subsequently fused, may also enhance interpretability and predictive power. Different integration strategies will be investigated in future versions of our framework.

Graph Kolmogorov–Arnold networks (GKAN)

GKAN represents a neural architecture that extends the Kolmogorov–Arnold Representation Theorem to graph-structured data, offering an alternative to traditional GNNs. Namely, unlike traditional GNNs which rely on message passing, GKAN utilizes functional decomposition to model interactions within graphs. By decomposing node and edge relationships into hierarchical, learnable transformations, GKAN can effectively capture long-range dependencies supported by Kolmogorov-Arnold representation while addressing the over-smoothing problem by using learned edge activation that often affects GNNs37. GKAN expresses multi-dimensional functions as summations of nonlinear one-dimensional functions, enabling the adaptive transformation of node embeddings based on information from neighboring nodes. This is grounded in the Kolmogorov-Arnold theorem, which states that any continuous multivariate function \(\:f:\:{\mathbb{R}}^{d}\to\:\mathbb{R}\) can be decomposed as:

$$\:f\left({x}_{1},{x}_{2},\dots\:,{x}_{d}\right)=\sum\:_{q=1}^{2d+1}{g}_{q}\left(\sum\:_{p=1}^{d}{h}_{q,p}\left({x}_{p}\right)\right)\:\:$$
(2)

where \(\:{g}_{q}\) and \(\:{h}_{q,p}\) are learnable nonlinear functions, \(\:{x}_{p}\)​ denotes the input features, and \(\:d\) is the input feature dimension.

For a given graph \(\:G=\:(V,\:E)\), where \(\:V\:\)is the set of nodes and \(\:E\) is the set of edges, the node features \(\:{h}_{v}^{\left(l\right)}\) at a layer \(\:l\) are updated using:

$$\:{h}_{v}^{\left(l+1\right)}=\sum\:_{q=1}^{2d+1}{g}_{q}\left(\sum\:_{u\in\:N\left(v\right)}{h}_{q,p}\left({h}_{u}^{\left(l\right)}\right)\right)\:\:$$
(3)

In (3), \(\:{h}_{v}^{\left(l\right)}\) denotes the feature representation of node \(\:v\) at layer \(\:l\), \(\:\mathcal{N}\left(v\right)\) represents the set of neighboring nodes of \(\:v\), and \(\:{g}_{q}\)​ and \(\:{h}_{q,p}\)​ are trainable transformation functions applied to graph features. After several layers of hierarchical transformations, the final node representation is obtained as:

$$\:{\widehat{\mathcal{y}}}_{v}=\sigma\:\left(\sum\:_{q=1}^{2d+1}{g}_{q}\left(\sum\:_{p=1}^{d}{h}_{q,p}\left({h}_{v}^{\left(L\right)}\right)\right)\right)\:\:$$
(4)

where \(\:{\widehat{\mathcal{y}}}_{v}\:\)is the predicted class or regression output for node \(\:\nu\:\), \(\:\sigma\:\) is an activation function such as softmax (for classification) or sigmoid (for binary prediction), and \(\:L\) is the total number of layers in the network.

Graph structure

Protein-protein interactions are fundamental to biological systems, as they represent physical contacts or functional relationships between two or more protein molecules. The interactions play a central role in regulating cellular processes. In this study, we constructed a PPI network using the STRING database, which integrates both experimentally validated and computationally predicted protein interaction data38. STRING aggregates data from diverse biological sources, including high-throughput experimental assays, curated pathway databases, co-expression analyses, and text-mined associations extracted from scientific literature39. To enhance the reliability of interaction data, STRING assigns confidence scores to each interaction based on the strength and consistency of supporting evidence, thereby improving the robustness of biological network analyses37.

For our analysis, we focused on constructing a disease-specific protein network that connects proteins associated with multi-omics-derived genes, including mRNA, miRNA, and DNA methylation profiles. We utilized the STRING API (version 11.5) to automatically retrieve Homo sapiens (NCBI Taxonomy ID: 9606) protein interaction data. The results in TSV format were accompanied by confidence scores derived from co-expression data, experimental findings, curated databases, and literature mining. The resulting graph based on PPI networks was afterward used as input to the MOGKAN model, enabling biological graph representations that support cancer classification and biomarker discovery.

Experimental setting

The pipeline of the proposed framework is illustrated in Fig. 2. To construct the graph structure, we utilized a PPI-based edge index, derived by identifying highly interactive proteins within the PPI network. The selection is based on protein frequency counts, where only proteins appearing at least 200 times in the dataset are retained, ensuring the inclusion of biologically significant hub proteins with prominent roles in cellular function and interaction networks. This filtering step also aids in adjusting the graph’s layout and the amount of data it contains. Excluding weakly connected hubs removes isolated nodes from the graph, resulting in a more cohesive and interpretable graph structure.

The layers in the MOGKAN architecture are depicted in Fig. 2. Multi-omics data for each sample, including mRNA, miRNA, and DNA methylation, are integrated into a unified feature vector that is assigned to the corresponding nodes in the graph. Information is propagated across the network using GATConv layers40, which dynamically assign weights to neighboring nodes based on an attention mechanism learned during training. This allows the model to prioritize the most informative interactions by computing relevance scores for each neighbor. Through multiple GAT layers, the network iteratively refines node representations by aggregating attention-weighted messages, effectively capturing both biological signals and meaningful connectivity patterns. The resulting node embeddings encode local gene-specific characteristics while also reflecting the broader structure of PPI networks.

Table 3 Grid search results for hyperparameter optimization of the MOGKAN model on multi-omics data.

To fine-tune hyperparameters we employed a grid search strategy for learning rate, weight decay, dropout rate, and the number of attention heads. Model training was performed using the Adam optimizer over 100 epochs. The results of a grid search for the MOGKAN model are presented in Table 3, which shows the top 10 hyperparameter combinations that resulted in the highest mean accuracy and F1-score during grid search optimization of the model on multi-omics data. Several different arrangements were tried by altering the main architecture and training details like the hidden dimension size, number of attention heads, the hidden layers, dropout rate, learning rate, and L2 regularization strength. The model achieved consistently high performance across multiple settings, obtaining mean accuracies of 96.1% and F1-scores exceeding 95%. The best achieved results were in the configuration using a hidden dimension of 2048, four attention heads, two hidden layers, a dropout rate of 0.2, a learning rate of 0.0001, an L2 penalty of 0.0001, and yielded an accuracy of 96.17% and an F1-score of 95.12%.

As depicted in Fig. 2, the framework employs 5-fold cross-validation ensuring robust evaluation across different data splits. Data from multi-omics is processed by two consecutive GAT layers. The first GAT multiplies feature vectors by four attention heads, each with 2048-dimensional information, followed by a second GAT that sums the outputs and a LeakyReLU is used as its activation. The output is then passed through three Kolmogorov–Arnold Network (KAN) layers that apply nested nonlinear transformations (ψ → tanh → ϕ → tanh), each followed by batch normalization, LeakyReLU activation, and dropout for regularization. Feature dimensions are progressively reduced from the initial hidden size to 1024 and then to 512. Lastly, a linear classifier takes each processed set of attributes and converts them to 32 categories for the different cancer types. This hybrid design enables MOGKAN to capture both topological dependencies and complex nonlinear patterns in multi-omics cancer data.

Performance evaluation was conducted using standard classification metrics, including accuracy, precision, recall, and F1-score, averaged across multiple folds of cross-validation. To interpret the model’s predictions, feature importance was assessed based on activations in the model’s GAT first layer. For every attention head, the model learns a set of coefficients to determine the importance of nearby genes (or nodes) when aggregating data. To find out the importance of each gene, we averaged the attention scores given to each gene across the first layer of the GAT. Instead of using just one “head,” this approach captures a comprehensive view of the model’s attention mechanism. After the training process, we obtained the average attention weights assigned to each node across all samples and heads. Genes that consistently received high attention scores across different samples were considered more influential, as they contributed more significantly to feature propagation and decision-making within the graph. The most influential features were mapped to their corresponding genes using a BioMart query, providing enhanced biological context and insight into their relevance in cancer classification.

Performance metrics

For model evaluation, we applied standard performance metrics for multi-class classification tasks including accuracy, precision, recall, and F1-score. The model accuracy serves as a measure of the overall correctness defined through the following equitation:

$$\:Accuracy=\frac{\sum\:_{i=1}^{N}{TP}_{i}}{\sum\:_{i=1}^{N}\left({TP}_{i}+{TN}_{i}+{FP}_{i}+{FN}_{i}\right)}\:$$
(5)

Macro-averaging enables calculation of precision, recall, and F1-score per class before computing their collective average without preference to any class, as follows.

$$\:Macro\:Precision=\frac{1}{N}\sum\:_{i=1}^{N}\frac{{TP}_{i}}{{TP}_{i}+{FP}_{i}}\:$$
(6)
$$\:Macro\:Recall=\frac{1}{N}\sum\:_{i=1}^{N}\frac{{TP}_{i}}{{TP}_{i}+{FN}_{i}}\:\:$$
(7)
$$\:Macro\:F1-score=\frac{1}{N}\sum\:_{i=1}^{N}\frac{2\times\:{Precision}_{i}\times\:{Recall}_{i}}{{Precision}_{i}+{Recall}_{i}}\:\:$$
(8)
Fig. 2
figure 2

The architecture of the Multi-Omics Graph Kolmogorov-Arnold Network (MOGKAN) for multiclass cancer classification.

Results and discussion

The performance of the proposed MOGKAN framework evaluated using 5-fold cross-validation is summarized in Table 4. MOGKAN achieved a classification accuracy of 96.28% across 32 cancer types by integrating mRNA, miRNA, and DNA methylation data. The results demonstrate performance improvement ranging from 1.58 to 7.30% compared to related works employing deep learning architectures based on Convolutional Neural Network (CNN), Graph Convolutional Neural Network (GCNN), and Graph Transformer Network (GTN). In particular, Mostavi et al.41 achieved 95.70% accuracy using a CNN-based model, Ramirez et al.42 reported 94.61% accuracy with a GCNN-PPI approach, whereas Kaczmarek et al.43 implemented a GTN model that achieved 93.56% accuracy. Moreover, MOGKAN exhibits improved reliability as evidenced by its low standard deviation across the multiple folds of ± 0.0035, which stands out against the variability in the results of related works.

Table 4 Experimental results for related deep learning methods for cancer classification with omics data.

Table 5 presents the experimental evaluation of MOGKAN with single-omics and multi-omics data for classification of 31 cancer types and normal tissues. Among the single-omics inputs, the model trained with mRNA data achieved the highest accuracy of 0.9562 along with 0.9524 precision, 0.9357 recall, and 0.9414 F1-score. The multi-omics MOGKAN model trained with combined DNA methylation and miRNA data performed similarly to the single-omics models, although the results remain slightly below top performance levels. The combined use of mRNA, DNA methylation, and miRNA data resulted in the most effective performance including 0.9628 accuracy, 0.9582 precision, 0.9445 recall, and 0.9489 F1-score. These results confirm that the integration of multiple omics modalities enhances the performance across all evaluation metrics.

Table 5 Experimental results for the proposed MOGKAN approach with single-omics and multi-omics data.

Notably, the removal of miRNA data resulted in the most significant drop in model performance, compared to excluding either mRNA or DNA methylation data. While mRNA is typically considered central to assessing gene expression and tumor signals, several biological and technical factors explain the impact of miRNA. Biologically, miRNAs play a critical role in post-transcriptional gene regulation, often acting as oncogenes or tumor suppressors. Dysregulation of miRNAs can influence the expression of numerous genes simultaneously, making them key indicators of cancer progression and subtype differentiation. Additionally, miRNAs are fewer in number and exhibit more stable expression patterns than mRNAs, which may help the model identify more robust and generalizable patterns. From a technical standpoint, the GraphKAN model’s attention mechanism assigns dynamic weights to input features. The stronger influence of miRNA features suggests that they contributed more prominently during training, allowing the model to focus on their informative value more effectively.

To further evaluate the generalization capability of MOGKAN, we performed a type-blind evaluation that withheld TCGA-LUAD, TCGA-LUSC, and TCGA-PRAD during training and used them only for testing. Table 6 demonstrates that the model maintains good predictive performance even in this rigorous evaluation. With the TCGA-PRAD cancer type the model achieved an accuracy of 0.9847, precision of 0.9973, and recall of 0.9620, indicating excellent generalization. With the TCGA-LUAD cancer type the model also showed robust performance with an accuracy of 0.9590, and an F1 score of 0.9393. Although the model performance on TCGA-LUSC was comparatively lower (accuracy = 0.9237, recall = 0.7945), it still reflects effective model generalization under type-blind conditions.

Table 6 Per-class performance metrics under type-blind evaluation settings.

Table 7 lists the top 10 biomarkers identified by the MOGKAN framework based on feature importance. The importance of each feature was quantified using the absolute sum of weights from the model’s linear transformation layer, enabling a discriminative selection of key biomarkers. BioMart was employed to map the features to their corresponding gene identifiers, enhancing biological interpretability. The ten identified biomarkers MCL1, LINC01410, GALNT6, MAML3, ITGB3, LINC01090, PKDCC, PCAT14, KIF16B, and PITPNM3 showed cancer-specific functional patterns that align with known mechanisms of carcinogenesis. For instance, MCL1 was reported to contribute to therapy resistance in breast cancer by regulating mitochondrial oxidative phosphorylation activity (PMID: 28978427)44. GALNT6 and PITPNM3 have been identified as dual-function proteins promoting both epithelial-mesenchymal transition and immune evasion (PMIDs: 39245709, 21481794)45,46. Several long non-coding RNAs, including LINC01410 (PMID: 32104067), LINC01090 (PMID: 34550610), and PCAT14 (PMID: 35003397), were found to participate in ceRNA regulatory networks. Notably, PCAT14 exhibited the highest diagnostic precision for prostate cancer47,48,49. Furthermore, ITGB3 and KIF16B were implicated in extracellular vesicle-mediated communication, with evidence supporting their role as potential biomarkers for metastatic colorectal cancer (PMIDs: 37040507, 35487942)50,51. The activity of MAML3 is regulated by hypoxia-inducible factors, activating Hedgehog (HH) and NOTCH pathways in gallbladder cancer (GBC), thereby promoting tumor growth, migration, and invasion, while also enhancing sensitivity to gemcitabine (PMID: 37351966)52. Lastly, PKDCC has been linked to non–small cell lung cancer progression (PMID: 35847849)53.

Table 7 Top ten identified pan-cancer biomarkers and supporting evidence.

Figures 3 and 4 depict the top 10 enriched Gene Ontology (GO) terms identified through MOGKAN analysis, highlighting molecular systems that contribute to multi-cancer classification. Figure 3 represents the top 10 enriched Gene Ontology (GO) terms for biological processes. Each horizontal bar corresponds to a GO term, with the length of the bar indicating the degree of enrichment, where the taller the bar the more enriched GO terms are. Our GO analysis identified the top 10 terms that our genes are associated with them, including positive regulation of respiratory burst, regulation of respiratory burst and apoptotic cell clearance, which highlight biological processes that are significantly overrepresented in the analyzed top 50 gene set. These GO terms reflect the role of reactive oxygen species (ROS) in modulating tumor microenvironment dynamics and immunotherapy outcomes54. Overall, the enriched GO terms validate the biological relevance of MOGKAN’s graph-based integration of multi-omics data, reinforcing its ability to uncover functionally significant pathways involved in cancer development and classification.

Fig. 3
figure 3

Gene ontology enrichment (biological process) of top 50 multi-cancer biomarkers identified by Graph Kolmogorov-Arnold Networks.

Figure 4 depicts the top 10 significantly enriched Gene Ontology (GO) molecular functions, primarily focused on lipid binding and ion channel regulation, suggesting roles in cellular signaling and membrane dynamics. The prominence of phosphatidylinositol (PI) binding terms—such as Phosphatidylinositol-3,5-Bisphosphate Binding (GO:0080025) and Phosphatidylinositol-3-Phosphate Binding (GO:0032266)—indicates involvement in phosphoinositide signaling, a pathway critical for membrane trafficking, autophagy, and cell survival. These findings align with established research indicating that dysregulation of phosphatidylinositol metabolism is prevalent across various cancers, as it activates the PI3K–AKT–mTOR signaling pathway55 and contributes to treatment resistance53.

Fig. 4
figure 4

Gene ontology enrichment (molecular function) of top 50 multi-cancer biomarkers identified by Graph Kolmogorov-Arnold Networks.

Figure 5 presents the cancer-related gene set enriched KEGG pathways56,57,58, ranked by statistical significance using –log₁₀(p-value) scores. The “Mucin type O-glycan biosynthesis” pathway emerged as the most significantly enriched, highlighting its role in altering glycosylation patterns on tumor cells, a known contributor to cancer progression59,60. Closely following is the “Sphingolipid metabolism” pathway, which supports tumor cell survival and resistance to therapeutic agents61. The “Prolactin signaling pathway” ranks next in significance, particularly relevant for its involvement in breast cancer regulation62. Additionally, approximately 15% of all cancer-associated genes in the dataset are mapped to the “PI3K-Akt signaling pathway”, which serves as a central regulator of cellular proliferation and apoptosis yet remains frequently abnormal in cancer development63. Further down the ranking, the “Rap1 signaling pathway” was identified, underscoring its role in cell adhesion and metastasis, which aligns with the invasive phenotypes observed in several cancers64. The pathways “Aldosterone-regulated sodium reabsorption”, “Type I diabetes mellitus”, and “Maturity onset diabetes of the young” were also enriched, suggesting shared metabolic disruptions between cancer and diabetes65,66. While several pathways show moderate enrichment with –log₁₀(p-values) around 0.6, “Mucin type O-glycan biosynthesis” exhibits the strongest enrichment signal. Collectively, these results reveal biological pathways that describe mechanisms of cancer development, particularly those related to glycosylation events, lipid metabolism, and growth factor signaling.

Fig. 5
figure 5

Significantly enriched KEGG pathways58,59,60 associated with cancer.

The presence of domain-specific deep learning applications to a variety of biological questions underpins the importance of focused, explainable models. As an illustration, in NSCLC ferroptosis-associated lncRNAs were reported to be strong prognosticators and predictors of immunotherapy efficacy67. On the same note, deep learning has been used to rebuild the features of protein transport68 and diagnose cleft lip and palate through imaging-based ML models69. All of these studies demonstrate the increasing range of applications of interpretable models in clinical diagnostic, which is consistent with the goal of MOGKAN to discover biologically relevant cancer biomarkers using graph-based learning.

Limitations and future work

The proposed MOGKAN framework has several limitations. First, it relies on static PPI network information data from the STRING database, which may lack dynamic or condition-specific protein interactions. Incorporating tissue-specific or context-aware interaction networks could strengthen the biological relevance of the constructed network. Second, while the current model integrates mRNA, miRNA, and DNA methylation data, it omits other valuable omics layers such as proteomics, metabolomics, and copy number variation, which could provide complementary biological insights. Third, we used an early integration strategy, where multi-omics data (mRNA, miRNA, and methylation) are concatenated into a single feature vector prior to graph modeling. While this approach simplifies representation learning, it may obscure modality-specific characteristics and interactions.

For future work, we plan to extend the framework by incorporating a broader range of multi-omics data and utilizing dynamic and context-specific interaction networks to enhance model performance and biological interpretability. In addition, implementing attention mechanisms to weigh the contributions of different omics features may further improve predictive accuracy. Also, we will explore late integration strategies, such as using modality-specific encoders followed by attention-based or gating fusion mechanisms, which may better capture complementary signals across omics layers and improve both performance and interpretability.

Conclusion

This study introduces MOGKAN, a novel deep learning framework for accurate and interpretable cancer classification using multi-omics data. The approach integrates a three-step data preprocessing pipeline, combining DESeq2, LIMMA, and LASSO regression to preserve key biological signals while reducing dimensionality. By fusing DNA methylation, miRNA, and mRNA data with Protein-Protein Interaction (PPI) networks, MOGKAN achieves a classification accuracy of 96.28% across 31 cancer types. Through the application of the Kolmogorov–Arnold theorem, the framework extracts hierarchical features that enhance both predictive performance and biological interpretability. Key biomarkers identified by MOGKAN including MCL1, GALNT6, and ITGB3 were validated through GO and KEGG pathway analyses, confirming their involvement in critical processes like PI3K-AKT signaling, lipid metabolism, and immune evasion. These findings demonstrate the capability of the proposed framework to uncover fundamental molecular drivers of cancer and support its potential for clinical application in personalized cancer therapy.