Introduction

Head and neck neoplasms are a significant global health concern, contributing to substantial morbidity and mortality1,2. The etiology of these cancers is multifactorial, involving genetic, environmental, and lifestyle factors. Despite extensive research, the molecular mechanisms underlying their development and progression remain poorly understood. Among these mechanisms, microRNAs (miRNAs) have emerged as key regulators of gene expression. Their dysregulation in various cancers, including head and neck neoplasms, suggests a critical role in tumorigenesis3,4,5. This study investigates the association between miRNAs and head and neck neoplasms, focusing on their potential causal roles.

MicroRNAs (miRNAs) are small, non-coding RNA molecules that play a vital role in post-transcriptional gene regulation in plants and animals. While computational methods have been developed to identify miRNA-mRNA regulatory relationships6, determining causal relationships often necessitates labor-intensive and expensive experimental validation1,2,7. A prior study introduced a causality discovery method using observational data from epithelial-to-mesenchymal transition (EMT) datasets, which was subsequently validated through biological experiments to elucidate miRNA-mRNA regulatory interactions2,8.

MicroRNAs are versatile cellular regulators involved in numerous biological processes and diseases9. Understanding their roles in disease requires distinguishing correlation from causation, conducting functional validations, and identifying relevant research findings10,11. A bipartite graph framework was employed to model the connections between miRNAs and diseases12. This approach enabled the construction of a miRNA-associated disease network (MDN), providing insights into how similar diseases cluster within the network13.

Oncogenic miRNAs promote aggressive cancer behaviors by influencing the p53 pathway, critical for regulating the cell cycle and apoptosis. For example, miR-125a enhances cell proliferation, migration, and invasion, while miR-134 reduces E-cadherin expression, facilitating cancer cell detachment and metastasis. MiR-134 also suppresses WWOX, a tumor suppressor gene, further contributing to tumor progression. In contrast, suppressor miRNAs, such as let-7i, ARID3B, let-7c, EZH2, CDK8, and STAT3, inhibit tumor growth and metastasis14. Let-7i suppresses ARID3 B expression, thereby activating genes associated with stemness and pluripotency, which could enhance cancer cell resilience. Let-7c potentially mitigates inflammatory responses that contribute to cancer progression and treatment resistance. MiR-101 inhibits EZH2, suppressing metastasis and epithelial-mesenchymal transition (EMT), suggesting a protective role against cancer spread. Additionally, miR-101’s inhibition of CDK8, a key player in transcriptional regulation, highlights a mechanism to prevent tumor growth by modulating critical signaling pathways. Lastly, miR-124 inhibits STAT3, a protein often activated in cancers, to limit cell survival and proliferation15.

While computational methods have advanced our understanding of miRNA-mRNA regulatory relationships, many statistical approaches fail to uncover causal connections16. A previous study proposed a causality discovery method to identify regulatory relationships between miRNAs and mRNAs. This method, validated through biological experiments, offers cost-effective insights for designing miRNA experiments and understanding miRNA-mRNA interactions. A recent study showed that A model was developed and evaluated using fivefold cross-validation, achieving an average AUC of 96.07% and AUPR of 93.23%, demonstrating its effectiveness in predicting lncRNA-disease associations13. Ablation studies confirmed the importance of homogeneous graphs and node path weights, and a recent mintRULS method, a regularized least squares-based method, uses similarity metrics to predict miRNA-mRNA interactions. The method utilizes free energy, sequence and repeat identities, and target site accessibility, achieving a superior performance in accuracy of 0.9317,18.

Graph neural networks (GNNs) play a pivotal role in cancer informatics, enabling the analysis of molecular structures, drug response prediction, biomarker identification, patient stratification, and drug discovery and repurposing. Among these, Graph Attention Networks (GATs), a specific type of GNN, excel at learning attention weights to highlight the importance of nodes and edges within a graph. GATs predict potential disease associations in biological networks by representing complex interactions between genes, proteins, and other molecular entities. These models use techniques such as network representation, node feature encoding, attention mechanisms, aggregation, propagation, and edge prediction to identify critical nodes and capture nuanced dependencies. This capability facilitates accurate edge predictions in disease association studies, enabling researchers to uncover molecular mechanisms underlying diseases and potentially accelerate the discovery of novel disease associations and targeted therapeutic interventions.

GraphTar19, a novel target prediction method leveraging RNA sequences with a graph neural network classifier, has demonstrated comparable performance to existing methods but requires further research and dataset expansion. Similarly, DGAMDA20, an AI-based model for miRNA-disease association prediction, combines feature mapping and attention mechanisms to outperform traditional methods in feature mining and prediction for unknown diseases21,22,23. GATs have proven especially effective for miRNA-disease prediction due to their ability to model relationships between miRNAs, diseases, and other biological entities while integrating heterogeneous data24,25. These networks capture complex interactions and dependencies26, as demonstrated by previous studies in which GAT models effectively extracted nonlinear features of diseases and miRNAs through the weighted aggregation of neighboring information. In cases such as breast, colon, and lymphoma cancers, GATs have achieved remarkable performance, including an average AUC of 0.9566 with five-fold cross-validation, surpassing earlier models and aligning with biological experiments25.

Previous studies have shown that the use of deep learning and graph-based techniques has led to several state-of-the-art computational models improving the prediction of miRNA–disease associations in recent years. By combining miRNA, disease, and lncRNA data within a heterogeneous framework, HGANMDA’s hierarchical graph attention network improves semantic learning and prediction accuracy27. GCNFORMER enhanced prediction power by combining transformer architectures and graph convolutional networks to capture long-range dependencies and local graph structure28. PDMDA, an end-to-end deep learning method built on graph neural networks, better represented complex disease-association patterns29. Strong and comprehensible predictions of miRNA–disease associations were made possible by MUSCLE’s application of multi-view and multi-scale attention mechanisms for feature fusion30. Finally, by successfully integrating diverse biological data, MAMFGAT outperformed baseline models using multiplex adaptive modality fusion in a graph attention network26. These illustrative studies highlight the field’s innovation and momentum, offering a solid comparative basis for the suggested GAT-based methodology.

Although existing studies highlight the significant role of miRNAs in disease prediction, research specifically addressing miRNA-disease associations in oral cancer using GATs is limited. Only a few studies20,21,31 have applied GATs to analyze causal relationships between miRNAs and mRNAs during biological processes. These studies combine linear features derived from disease-miRNA correlation profiles with non-linear features extracted using GATs. This study addresses this gap by identifying and predicting miRNA-disease causation using GATs in the context of head and neck cancer.

Materials and methods

Figure 1 illustrates the architecture and workflow of a Graph Attention Network (GAT). It begins with an input layer where graph data is introduced, followed by pre-processing layers that clean and extract features. The graph is then constructed with nodes and edges. The GAT layer applies multi-head attention mechanisms to effectively capture node interactions. Subsequent hidden layers with 128 neurons further process the data, culminating in an output layer that predicts graph-related outcomes.

Fig. 1
figure 1

Graph Attention Network (GAT) Architecture and Workflow.

Preparation of the Dataset.

The dataset was evaluated using HMDD v4.032, a database that provides comprehensive data on miRNA-disease association causality. Built on SQLite and Django, HMDD v4.0 enables users to browse miRNA-disease associations and access detailed information, including curated miRNA–disease associations in .txt and .xlsx formats. Each entry has the miRNA name, disease, causality (Yes/No), category, PubMed ID, and evidence. We extracted all ‘Head and Neck Neoplasms’ and related subtypes. ‘Yes’ pairs are verified associations; ‘No’ pairs are curated without causality evidence. No negatives were added. The head and neck neoplasm-specific subset from HMDD v4.0 included 163 nodes and 210 edges. To prevent overfitting due to dataset size, techniques such as five-fold cross-validation, dropout (0.5), early stopping, and a lightweight GAT architecture were employed. validation, dropout (0.5), early stopping, and a lightweight GAT architecture were used. The dataset had 105 causal and 105 non-causal miRNA–disease pairs, maintaining a 1:1 ratio. No oversampling or undersampling was required since the classes were already balanced. Five-fold cross-validation, dropout (0.5), and early stopping were employed to prevent overfitting and ensure stable performance.To enhance generalizability, the model was trained and evaluated over five runs with different random seeds using an 80/10/10 split. The average and standard deviation of AUC and AUPR across runs were reported as the primary performance metrics to ensure reproducibility and robustness.

For this study, the dataset of head and neck neoplasms was extracted. The dataset includes miRNA, disease, causality (categorized as “nonetworks, focusing on edge interactions corresponding to causality. Data preprocessing " and “yes”), and PMID. Nodes were assigned to miRNAs, diseases, and causality, while edges represented the interactions between them. Node features were labeled and analyzed using graph neural networks, which removed missing values and duplicates, followed by one-hot encoding for categorical variables (Fig. 1). To guarantee proportionate representation of both positive (“yes”) and negative (“no”) classes, the dataset was split into three subsets: 80% for training, 10% for validation, and 10% for testing. Model learning and optimization are performed on the training set, while hyperparameter tuning and early stopping are conducted on the validation set. The final performance and generalization are then assessed on the test set. We employed dropout regularization, cross-validation, and balanced sampling to mitigate model bias by ensuring an equal number of positive and negative examples.

Graph Attention Network Architecture.

GATs22,23 are a type of graph neural network designed to leverage attention mechanisms for capturing the importance and dependencies of nodes and edges in a graph. The attention mechanism computes coefficients for each pair of connected nodes, quantifying the relevance of a node’s features relative to its neighbors. A learnable weight matrix calculates these coefficients, which are normalized to balance the relative importance of various nodes and edges. GATs aggregate information from neighboring nodes, capturing the local neighborhood context. GATs often utilize multiple attention heads to improve performance, enabling the model to extract diverse structural features and making them highly effective for tasks such as edge prediction in disease association studies.

The GAT architecture is particularly suited for graph-structured data. Each node is associated with a feature vector that encodes its attributes. The attention mechanism assigns weights to neighboring nodes based on feature similarity, allowing the model to prioritize relevant neighbors while reducing the influence of less pertinent ones. GATs perform graph convolution operations to aggregate node features and generate updated node representations, with attention weights determining the contribution of each neighbor’s feature vector. Multi-head attention allows the model to capture various aspects of the graph by having each attention head independently compute weights and aggregate features. These representations are processed through a classification or regression layer to produce the final predictions.

This study used a PyTorch implementation of the GAT model, incorporating multi-head attention and a semantic-layer attention vector to improve representation learning. The model was trained for 100 epochs using a learning rate of 0.001. The architecture included hidden layers with 128 neurons, a batch size of 32, and the Adam optimizer to ensure an adaptive learning rate and efficient GPU utilization during training with a random seed of 42. Dropout regularization was applied to randomly drop nodes and edges, thereby preventing overfitting and enhancing the model’s generalizability (Tables 1 and 2).

Table 1 shows the hyperparameters employed in this study.

Table 1 Hyperparameters.

Table 2 shows the Benchmark Methods and Hyperparameter Settings.

Table 2 Benchmark methods and hyperparameter Settings.

In this study, the Graph Attention Network model was implemented using the PyTorch deep learning framework (likely version ≥ 1.10; [https://pytorch.org/](https://pytorch.org/)), leveraging multi-head attention and semantic-layer mechanisms for predicting microRNA-disease associations. The dataset was sourced from the Human microRNA Disease Database (HMDD) version 4.0 ([http://www.cuilab.cn/hmdd](http://www.cuilab.cn/hmdd)), which is built on SQLite and Django. Python (version 3.8 or later) served as the primary programming language for model development and data preprocessing, while commonly used libraries such as Pandas, NumPy, and Scikit-learn were likely utilized for tasks including data cleaning, one-hot encoding, and standardization.

Results

The Graph Attention Network (GAT) model achieved an accuracy of 83% in predicting the causality of microRNA-disease associations in head and neck neoplasms. This high accuracy can be attributed to integrating graph neural networks and attention mechanisms with a high-quality dataset. The HMDD v4.0 dataset, recognized as a comprehensive and reliable resource for microRNA-disease associations, significantly enhanced the model’s performance. Additionally, the model effectively leveraged the intrinsic characteristics of head and neck neoplasms, incorporating features such as microRNA expression profiles, disease-specific attributes, and insights from relevant literature.

The GAT model’s performance underscores its ability to capture intricate dependencies within the graph structure, employ attention mechanisms to prioritize critical information, and derive insights from a robust dataset. The accuracy of 83% aligns with findings from previous studies, reinforcing the utility of graph neural networks in tackling complex prediction tasks involving microRNA-disease associations. Our proposed GAT model attained an accuracy of 0.83, with a balanced accuracy of 0.6881, a macro F1-score of 0.6878, and a weighted F1-score of 0.864. Furthermore, we evaluated additional metrics such as precision (macro: 0.6896), recall (macro: 0.6881), MCC (0.7497), Cohen’s Kappa (0.7437), and log loss (0.2783), which all reinforced the model’s robustnessWe performed statistical comparisons between the GAT model and four benchmark methods. Paired t-tests showed GAT significantly outperformed GCNCRF (p = 0.0332), but differences with DGAMDA (p = 0.2495), GATMDA (p = 0.0955), and MGCNRF (p = 0.1692) were not significant. Effect size analysis (Cohen’s d) indicated medium-to-large effects in all comparisons (DGAMDA: 0.65, GATMDA: 0.97, GCNCRF: 1.29, MGCNRF: 0.78), confirming GAT’s practical advantages.

In our study assessing the biological relevance of the identified miRNAs, we conducted a functional enrichment analysis using the miEAA 2.0 tool, which showed significant enrichment in key pathways related to oral cancer. Notably, the TGF-beta signaling pathway emerged as the most enriched, emphasizing its critical role in the epithelial-mesenchymal transition and tumor progression in oral squamous cell carcinoma (OSCC). Additionally, the analysis identified enrichment in the PI3K-Akt signaling pathway, indicating its involvement in cell proliferation and survival mechanisms relevant to OSCC, as well as highlighting the ECM-receptor interaction pathway, which is essential for cell adhesion and migration—processes that are vital for cancer metastasis.

Figure 2a and b present a comprehensive graph analysis of the microRNA-disease association network specifically focusing on head and neck neoplasms. The network consists of 163 nodes, representing various microRNAs and diseases, interconnected by 210 edges. This results in a network density calculated at 0.0159, suggesting a relatively sparse structure that may impact the robustness of the associations being studied. Notably, the nodes with the highest degree of centrality include significant entities such as “Squamous Cell Carcinoma, Head and Neck,” “Head and Neck Neoplasms,” and “Carcinoma, Head, and Neck,” as well as key microRNAs like “hsa-let-7 g” and conditions like “Head and Neck Adenoid Cystic Carcinoma,” indicating their crucial role in the network’s connectivity. Figure 2b reveals not only shared miRNAs, such as hsa-let-7 g and hsa-miR-21, indicating their broader role across various cancer types, but also specific miRNAs like hsa-miR-200b and hsa-miR-125a, which are related to Squamous Cell Carcinoma. This figure facilitates a better understanding of miRNA-mediated regulatory mechanisms associated with the heterogeneity of head and neck cancer and aids in identifying potential diagnostic biomarkers.

Fig. 2
figure 2figure 2

(a) Graph analysis of the microRNA-disease association network for head and neck neoplasms. (b). The network diagram illustrates the colors corresponding to the subtypes of head and neck cancers: Head and Neck Carcinoma (red), Head and Neck Neoplasms (blue), Squamous Cell Carcinoma (pink), and Adenoid Cystic Carcinoma (cyan). It highlights the relationships between specific microRNAs (miRNAs) and these cancers. The edges represent known associations, while the nodes indicate miRNAs or cancer subtypes.

Further analysis highlights that the top five nodes with the highest betweenness centrality serve as vital conduits for information flow, while those with high closeness centrality illustrate significant efficiency in signal transmission throughout the network. Additionally, nodes exhibiting the highest eigenvector centrality signify their overarching influence and essential importance within the broader context of disease interactions. Understanding these dynamics can lead to deeper insights into the underlying biological mechanisms and potential therapeutic targets in head and neck neoplasms.

Figure 3 presents the epoch loss curve and the model’s performance across 80 epochs. The x-axis denotes the number of completed epochs, while the y-axis represents the loss value, indicating model performance. Initially, the loss starts at 0.7 at epoch zero and steadily decreases to 0.5 by epoch 80, signifying enhanced performance and more accurate predictions. This reduction reflects the model’s learning process, including the optimization of parameters and adaptation to the dataset.

Fig. 3
figure 3

Training and validation loss curves along with model performance metrics across epochs. The epoch-wise loss curves depict the training loss (blue) and validation loss (orange), demonstrating the model’s convergence over time. The model’s performance metrics (e.g., accuracy, F1-score) are plotted per epoch, showing improvements with training. Shaded regions represent standard deviation (if applicable) across multiple runs.

The observed decrease in loss demonstrates the model’s capacity for improved prediction accuracy. However, it is essential to interpret this trend regarding factors such as learning rate, regularization techniques, dataset size, and data complexity. Additionally, the model’s performance on unseen test or validation data should be assessed to confirm its generalization ability and avoid overfitting.

Figure 4 depicts a confusion matrix, a crucial tool used to evaluate the performance of a classification model in machine learning. It provides a comprehensive way to compare the model’s predictions against the actual values of the target variable. By organizing the results into a structured format, the confusion matrix enhances our understanding of how well the model performs across different classes.

Fig. 4
figure 4

Normalized confusion matrix for the model’s classification performance on the test set. Each row represents the true class, while each column corresponds to the predicted class. Diagonal entries indicate correct classifications, whereas off-diagonal entries show misclassifications. The matrix is normalized to show proportions (or counts, if specified) for better interpretability. The color intensity reflects the frequency of predictions, with darker shades indicating higher values.

Discussion

MicroRNAs (miRNAs) are small, non-coding RNA molecules that regulate gene expression and play a critical role in various diseases, including cancer. Acting as either oncogenes or tumor suppressors, miRNAs influence processes such as uncontrolled cell proliferation and evading apoptosis. In head and neck neoplasms, miRNAs contribute to cancer pathogenesis by regulating the cell cycle, apoptosis, and metastatic potential. Importantly, miRNAs have emerged as potential biomarkers for the early detection and diagnosis of head and neck cancers, with their expression levels offering valuable prognostic information33,34,35,36. For instance, the overexpression of miR-196a has been associated with poor prognosis and reduced survival rates in patients with head and neck cancers, whereas decreased levels of miR-200b and miR-200c correlate with aggressive tumor behavior and poorer clinical outcomes19,37,38.

This study introduces a novel approach for predicting microRNA-disease associations in oral cancer using Graph Neural Networks (GNNs). The proposed model incorporates attention mechanisms, multiple attention heads, and semantic-layer attention vectors to enhance representation learning, demonstrating its applicability to oral cancer and head and neck neoplasms.

Graph-based deep learning methods offer precise insights into disease associations by capturing the complex relationships within biological data. Here, the Graph Attention Network (GAT) was employed to model microRNA-disease causality and uncover novel insights into microRNA associations in head and neck neoplasms. Furthermore, a newly developed pseudotime causality method was used to identify causal relationships between miRNAs and mRNAs during biological processes, outperforming existing methods39.

Integrating GAT-based models with innovative features demonstrates their potential for advancing disease prediction40. For example, the GATMDA model combines linear features from disease and miRNA correlation profiles with non-linear features extracted using a graph attention network, achieving a high average AUC of 0.9566, which surpasses earlier models26. Similarly, a prior study introduced GCNCRF, a method based on a graph convolutional neural (GCN) network combined with a conditional random field (CRF), for predicting human lncRNA-miRNA interactions. GCNCRF achieved an area under the receiver operating characteristic (ROC) curve of 0.947 on the primary dataset, outperforming six state-of-the-art methods41. Another study applied graph convolutional neural networks to encode and learn metabolite-disease associations, leveraging a graph attention layer to achieve a reliable ROC curve area of 0.95 and a precision-recall curve of 0.40542.

Moreover, advancements in deep learning have shown promise in other domains. For instance, the DCAMCP model was developed to distinguish between carcinogenic and non-carcinogenic compounds. Validated through cross-validation and external ing, this model achieved high accuracy, sensitivity, specificity, and an impressive area under the ROC curve43. These findings underscore the potential of graph-based and deep learning models in biomedical research, particularly for complex disease association studies.

The MAMFGAT model44 employs adaptive fusion and cross-modality learning to enhance feature extraction, demonstrating superior performance compared to other approaches in cancer studies. The HGANMDA network utilizes node and semantic-layer attention to predict miRNA-disease associations effectively. Its high performance in predicting these associations provides valuable insights for researchers studying specific diseases. However, addressing the limitations of employing longer meta-paths remains critical for future improvement. Similarly, HGATMDA20,22,23,45, a GAT-based computational method, predicts miRNA-disease associations using weighted DeepWalk and GAT techniques46. This method outperforms existing approaches in identifying disease-related miRNAs, suggesting its potential for diagnostic and therapeutic applications. These studies collectively demonstrate the effectiveness of GATs in analyzing microRNA-disease associations with remarkable accuracy47.

In this study, we applied a Graph Attention Network (GAT) to predict microRNA-disease causality in head and neck neoplasms48. The model achieved an accuracy of 83% in predicting these associations (Figs. 2, 3 and 4), representing a significant advancement in the analysis of disease associations. The GAT model demonstrated superior capabilities in identifying authentic associations compared to traditional statistical and machine learning techniques. Its adaptive learning capacity enables the extraction of complex features and non-linear patterns, improving accuracy in studying diseases such as head and neck cancers. The 83% accuracy highlights the model’s potential in enhancing diagnostic and therapeutic strategies. With an accuracy of 83%, the suggested GAT model performed marginally worse on general datasets than DGAMDA (~ 85%) and GCNCRF (~ 86%). Nonetheless, our model provides biologically interpretable results and is customized to disease mechanisms by concentrating on causal prediction unique to head and neck neoplasms.

Graph-based deep learning models like GAT have shown promise in predicting microRNA-disease associations across various cancers, but they also face several limitations. These include data quality and completeness challenges, generalizability, computational resource demands, overfitting risks, feature selection bias, and the inability to account for dynamic biological changes. The static nature of the model may limit its ability to capture evolving biological presses, potentially impacting the accuracy of predictions. Furthermore, the lack of transparency in how the model derives its predictions can hinder its acceptance in clinical and research settings, where explainability is crucial.

Future research should focus on overcoming these limitations to enhance the model’s applicability and utility. Advancing miRNA-disease association prediction can be strengthened by dynamic and temporal modeling, which tracks expression changes over time, and multi-omics integration, combining transcriptomic, proteomic, and epigenomic data for deeper insights. Explainable AI (XAI) is also crucial for clinical adoption, ensuring trust and transparency, while federated learning allows secure collaboration across institutions while preserving genomic data privacy. Efforts to incorporate dynamic biological changes, optimize computational efficiency, and improve the interpretability of the model will be essential. Addressing these challenges will refine the accuracy and robustness of the GAT model and promote its adoption in both clinical and research environments, ultimately advancing our understanding and management of head and neck cancers.

Conclusion

This study identifies biologically relevant miRNAs like miR-125a, miR-134, and let-7c and presents a novel use of GATs for predicting microRNA–disease causality in head and neck neoplasms, with an accuracy of 83%. The model found significant correlations in a carefully curated miRNA–disease network by utilizing graph topology and multi-head attention. Interpretable graph-based predictions, modular decomposition exposing high-centrality hubs, and the method’s possible application for the identification of oral cancer biomarkers are among its main contributions. Nevertheless, the use of static network data, class imbalance, and lack of experimental validation are some of the limitations. Incorporating tissue-specific or temporal datasets, using cost-sensitive loss functions and PU learning, branching out to multi-modal biological networks, and utilizing explainable AI techniques are some of the future directions. All things considered, the suggested GAT framework presents a viable path forward for the development of causal inference and biomarker prioritization in head and neck cancer research.