Introduction

Squamous cell carcinoma of the head and neck (HNSCC) is an aggressive malignancy characterized by high morbidity. Approximately 650,000 new HNSCC cases are reported annually, of which only 40–50% of patients will survive for five years1. This cancer arises in the mucosal lining of the upper respiratory tract, including the oral cavity, oropharynx, larynx, and hypopharynx. The main risk factors associated with the development of HNSCC include smoking tobacco, alcohol consumption, the use of smokeless tobacco products, genetic susceptibility, and human papillomavirus (HPV) infection2. The principal treatment strategy for HNSCC remains limited to surgery with adjuvant standard radio- or chemotherapy, which is associated with high-grade toxicities and reduces quality of life. Although neoadjuvant chemotherapy can be effective for locally advanced tumors to prevent the development of distant metastases, approximately 20% of HNSCC patients still present with distant metastatic tumors after treatment. On the other hand, such tumors are challenging to treat and eradicate surgically because of the complexity of maxillofacial anatomy, the challenges of surgical access, and the necessity of maintaining functionality3. These factors underscore the urgency for developing effective treatment strategies for distant metastasis in HNSCC.

Distant metastasis is a major factor associated with poor prognosis and, in turn, reduced survival in HNSCC, but its underlying molecular mechanisms are not well-known4. Without clear insight into these metastasis mechanisms, effective treatments will still be unavailable. Therefore, a deeper understanding of these mechanisms is a crucial unmet need to improve patient outcomes.

Recent developments in high-throughput technologies provide genome-scale snapshots of gene expression data (i.e., transcriptomes), a comprehensive source of gene relationship information. Identifying biological molecule interactions has paved the way for decoding complex molecular mechanisms.

The dynamic nature of interactions among genes is associated with changes in cellular conditions such as response to various external stimuli and signals (i.e., the presence or absence of certain hormones and metabolites, as well as ionic homeostasis)5,6. In more detail, biological processes may be interwoven because some proteins have multiple molecular roles and may participate in different processes; on the other hand, biological processes can be turned on or off under changes in cellular conditions. Therefore, such cross-talk among biological circuits can influence the interaction of two genes in a nontrivial way7.

A notable example of dynamic correlation is the interaction between the growth hormone and the thyroid hormone receptor retinoid X receptor dimer (TR-RXR) in response to expression levels of thyroid hormone. A direct correlation is observed between TR-RXR and growth hormone expressions in the presence of thyroid hormone. Meanwhile, in biological conditions with low or no expression of thyroid hormone, an inverse correlation is observed between TR-RXR and growth hormone expressions6. Therefore, the strength and pattern of correlation between the gene expression profiles of two genes may be affected by the internal changes and the cellular state.

Several statistical measures have been employed to detect co-expression patterns between a pair of genes, such as the Pearson correlation8, empirical Bayesian approach9, mutual information10, and entropy-based measures11. Nevertheless, conventional approaches do not provide efficient detection of dynamic changes in gene co-expression patterns.

Previously, a statistical measure was introduced by Li7, named liquid association (LA), to capture the dynamic nature of co-expression relations in various cellular conditions. The term “liquid”, in contrast to “solid”, implies a dynamic association pattern between genes modulated differently under various cellular conditions. More specifically, the LA measure quantifies the change in the co-expression pattern of two genes ({X, Y}) following alterations in the expression level (or genotype) of a third gene (Z), known as a switch gene. Accordingly, the dynamic correlation pattern is also known as the three-way interaction model. This model can decipher the sophisticated molecular relations at a higher level than conventional co-expression patterns. Moreover, such a model provides an effective framework to detect the mechanisms that switch biological processes on or off12,13.

Despite its metastatic nature receiving scant attention, the molecular markers and pathways associated with HNSCC have been extensively studied14,15,16,17,18,19,20,21. Some of the most significant potential biomarkers related to the metastatic nature of HNSCC include MAL22, Loricrin23, EGFR, VEGF, claudin7, maspin and SCCA24 genes. Moreover, previous studies identified a perturbation in some signaling pathways that can make HNSCC prone to distant metastasis. The primary reported pathways include extracellular matrix remodeling, hypoxia, and angiogenesis25, as well as the NTS and NTSR1 oncogenic pathways4. Although the above finding indicates some biomarkers and pathways associated with distance metastasis, the accurate and reliable biomarkers that predict patients at the highest risk for local recurrence have yet to be defined.

For the first time in the current study, we used the three-way interaction model to identify critical genes and biological pathways associated with distance metastasis in the HNSCC. This computational method has been applied increasingly to study potential drug targets in numerous diseases26,27,28,29,30, human age-associated genes31 and also discover central microbial species and environmental factors of the microbial community32.

Results

Determining statistically significant triplets

LA score for every possible triplet combination, i.e., Z/{X and Y}, was calculated using the fastLA method33. The changes in FDR versus –log (p-value) for the first 200,000 results of fastLA are shown in Fig. 1A. Considering an FDR < 0.1, a set of significant triplets consisting of 768 triplet combinations was selected for further analyses. The list of all statistically significant triplets has been presented in Table S1.

Fig. 1
figure 1

Analysis of LA score and assessment of accuracy using the fastLA method. (A) FDR versus –log(p-value) for the first 200,000 fastLA results, with (x, y) = (6.3, 0.1) indicating the FDR threshold of 0.1 for selecting significant triplet combinations. (B) The accuracy of fastLA was assessed by comparing the observed event rate of Z position genes (red line) with a randomly generated event rate (blue line) across various p-values, indicating a higher number of significant events compared to random chance (blue line).

To assess the accuracy of the fastLA analysis, we compared the observed event rate of Z position (switch) genes across a wide range of significant fastLA p-values with a randomly generated event rate. The plots of such comparison have been presented in Fig. 1B.

Figure S2 provides additional data, including box plots of data before and after normalization and a histogram of p-values for the first 200,000 triplets.

Screening biologically-relevant triplets

We employed Gene Set Enrichment Analysis (GSEA) to identify biologically relevant triplets, using a significance threshold of p-value < 0.05 and FDR < 0.1 for all genes involved in 768 statistically significant triplets. Only terms at level 7 or higher in the Gene Ontology were included to enhance specificity, as lower level-terms tend to be more general. The main enriched terms in the “biological process” and “KEGG pathway” categories are illustrated in Table 1.

Table 1 The main enriched terms in the “biological process” and “KEGG pathway” categories.

According the three-way interaction approach concept, we expect that genes X and Y would participate in the same biological processes or pathways. Hence, we traced genes X and Y within the enriched biological processes or pathways for each statistically significant triplet to identify biologically relevant triplets. Among these, 26 triplets met the above criteria. The comprehensive list of such biologically-relevant triplets is available in Table 2, and in addition, the association between biologically-relevant triplets and biological processes- or pathway-enriched terms has been presented in Fig. 2.

Table 2 The list of 26 biologically-relevant triplets based on gene set enrichment analysis.
Fig. 2
figure 2

Association between biologically-relevant triplets and enriched biological processes or pathways. This figure illustrates the association between significant triplet combinations identified using the fastLA method and their corresponding enriched biological processes or pathways. Each triplet is mapped to specific biological terms representing underlying biological functions or pathways, highlighting the relevance of these triplets in biological contexts.

As an additional effort to detect biologically relevant triplets, we reconstructed the Gene Regulatory Network (GRN) using the ARACNE algorithm. The regulatory connections among significant triplets identified through the LA method were mapped within this network. Conclusively, the biological relevance of six statistically significant triplets was confirmed during both the GSEA and the GRN reconstruction. The results have been depicted as a sub-network in Fig. 3.

Fig. 3
figure 3

Gene Regulatory Network (GRN) reconstruction of biologically-relevant triplets. This figure illustrates the Gene Regulatory Network (GRN) reconstructed using the ARACNE algorithm to detect biologically-relevant triplets. The regulatory connections among significant triplets identified through the LA method are mapped within this network. The biological relevance of six statistically significant triplets was confirmed during both Gene Set Enrichment Analysis (GSEA) and GRN reconstruction. These confirmed triplets and their interactions are depicted as a sub-network in this figure, highlighting key regulatory relationships among the identified triplets.

Validation of central genes at the protein level

The validity of the biologically relevant triplets was assessed at the protein level through the UALCAN portal. The protein expression levels of their involved genes were compared between metastasizing (M) and non-metastasizing (NM) groups across four major metastatic pathways: PTK, NRF2, mTOR, and Hippo. The results revealed significant dysregulation in two proteins (GINS2 and AKT2) from the Gins2/{Akt2, Anxa2} triplet within the mTOR pathway. Moreover, two proteins (HOMER2 and KEAP1) from the Homer2/{Keap1 and Edn3} triplet were dysregulated in the Hippo pathway (See Fig. 4).

Fig. 4
figure 4

Protein expression levels of biologically-relevant triplets assessed through UALCAN portal. The validity of biologically-relevant triplets was assessed at the protein level using the UALCAN portal. Protein expression levels of genes involved in these triplets were compared between metastasizing (M) and non-metastasizing (NM) groups across four major metastatic pathways: PTK, NRF2, mTOR, and Hippo. Significant dysregulation was observed in two proteins, GINS2 and AKT2, from the Gins2/{Akt2, Anxa2} triplet within the mTOR pathway. Additionally, dysregulation was detected in two proteins, HOMER2 and KEAP1, from the Homer2/{Keap1 and Edn3} triplet in the Hippo pathway. The box plots show the Z-values of protein expression levels for each group.

Such multi-level validation emphasizes the consistency of the gene expression patterns across protein levels, reinforcing the biological significance of these genes in HNSCC pathogenesis.

The results of the above analyses illustrated that the Gins2/{Akt2, Anxa2} triplet is distinguished from other triplets due to its distinctive attributes. First, this triplet is statistically significant and biologically relevant, as confirmed by both Gene Set Enrichment Analysis (GSEA) and Gene Regulatory Network (GRN) analyses. Secondly, the dysregulation of GINS2 and AKT2 in metastatic patients compared to non-metastatic individuals was ascertained at the protein level through in-silico validation.

The scatter plots for this triplet, grouped into three bins based on the expression levels of its corresponding Z gene, are shown in Fig. 5. These plots highlight a significant dynamic in the correlation between genes X and Y due to changes in Z expression levels.

Fig. 5
figure 5

Dynamic co-expression pattern in Gins2/{Akt2, Anxa2} triplet that is biologically relevant. According to the concept of three-way interactions, the dynamic changes in co-expression patterns of genes X and Y in response to fluctuations in the expression of gene Z are of central importance in the hypothetical Z/{X, Y} triplet. Therefore, the co-expression pattern of Akt2 and Anxa2 genes has been illustrated in three bins based on the expression levels of the Gins2 gene. These bins categorize the expression levels of Gins2 into low, medium, and high groups. When the normalized expression level of Gins2 is low, as commonly observed in non-metastasizing (NM) samples, there is an inverse correlation between Akt2 and Anxa2 expression levels (r_low = − 0.48). Conversely, in the high normalized expression range of Gins2, typical of metastasizing (M) samples, a direct correlation exists between Akt2 and Anxa2 expression levels (r_high = 0.42). Furthermore, the correlation between Akt2 and Anxa2 expression levels in transit state, when the normalized expression level of Gins2 is moderate, is near zero (r_transit = 0.11). These results illustrate a dynamic co-expression relationship between Akt2 and Anxa2, modulated by changes in Gins2 expression levels. The expression profile changes of Akt2 and Anxa2 between metastatic and non-metastatic HNSCC samples have been reported in Fig. S3.

Survival analysis

Survival analysis was conducted to evaluate the effect of switch gene expression levels on metastasis-free survival (MFS) over time. The samples were stratified for each candidate switch gene into low-, moderate-, and high-expression level groups. Comparisons of survival outcomes were made between the low- and high-expression groups. The Kaplan-Meier survival curves, presented in Fig. 6, illustrate the low expression levels of Dmtn, Camk2a, C19orf33, and A4galt are associated with shorter MSF. Conversely, the high expression levels of Usp13, Dffa, and Fam181b correspond to shorter MFS.

Fig. 6
figure 6

Kaplan-Meier survival curves illustrating the association between gene expression levels and metastasis-free survival (MFS). The Kaplan-Meier survival curves show the relationship between the expression levels of specific genes and metastasis-free survival (MFS). Low expression levels of Dmtn, Camk2a, C19orf33, and A4galt are associated with shorter MFS. Conversely, high expression levels of Usp13, Dffa, and Fam181b correspond to shorter MFS. These curves demonstrate the impact of gene expression levels on the survival outcomes of patients, highlighting the prognostic significance of these genes.

Notably, the genes C19orf33 and Usp13 might be particularly significant for prognostic purposes, as the absolute fold change in their expression levels exceeds 1.5 (see Fig. 7). However, further experimental studies are necessary to confirm these relationships and comprehensively understand the biological implications of the altered gene expression observed in HNSCC patients.

Fig. 7
figure 7

Fold change in expression levels of genes C19orf33 and Usp13. This figure presents the absolute fold change in the expression levels of genes C19orf33 and Usp13. Both genes exhibit significant changes, with an absolute fold change exceeding 1.5. These substantial changes in expression levels highlight the potential prognostic significance of C19orf33 and Usp13, suggesting their importance in the context of disease progression and patient outcomes.

It should be mentioned that using a p-value threshold of 0.01, we found no statistically significant correlation between Gins2 gene expression and survival outcomes in HNSCC. The survival analysis results for all the examined switch genes are reported in Table S4.”

Discussion

Distant metastasis is a significant factor associated with poor prognosis and reduced survival in HNSCC, yet it remains unpredictable with current tumor biomarkers4. On the other hand, developing effective treatments is challenging without a clear understanding of the mechanisms underlying metastasis. Thus, a deeper understanding of these mechanisms is crucial for improving disease treatment.

Gene expression profiling data offer snapshots of the activities of thousands of genes simultaneously, facilitating the systematic study of gene interactions. This study used a three-way interaction approach to identify dynamic gene co-expression changes that the classical two-way interaction approach cannot detect. The current study aims to provide insights into the biological pathways and critical genes associated with distant metastasis in HNSCC.

The LA method employed in this study was originally proposed by Li et al.7. This method has been extensively discussed in the literature as a valid and reliable approach for capturing dynamic co-expression relationships, particularly under varying cellular conditions31,32,34. To verify the accuracy of the LA statistical analysis, we compared the observed event rate of genes associated with the Z position (switch) across a broad spectrum of statistically significant LA p-values to a random event rate. From a biological perspective, it is anticipated that the number of genes occupying the Z position would be significantly lower than expected by chance, given that a limited number of genes typically govern most biological processes. As shown in Fig. 1, the observed event rate for switch genes is significantly different from the random expectation, indicating that certain genes predominantly occupy the Z positions in statistically significant triplets. The accuracy of the fastLA statistical analysis was verified by comparing the observed event rate of genes associated with the Z position (switch) across a broad spectrum of statistically significant fastLA p-values with a random event rate. The number of genes occupying the Z position is expected to be significantly lower than random because, from a biological perspective, a limited number of genes control most biological processes. It. As presented in Fig. 1, the observed event rate for switch genes significantly differs from random, indicating that certain genes predominantly occupy the Z positions in statistically significant triplets.

The enriched pathways in the pathogenesis of HNSCC

The following discusses the role of enriched pathways and biological processes in the pathogenesis of HNSCC through the literature.

Ras-associated protein 1 (Rap1) signaling pathway Rap1, a small GTPase, is known to promote cell migration, invasion, adhesion, and differentiation in several types of cancer35,36, including SCC37. The Rap1 signaling pathway is a complex and multifaceted pathway that plays a crucial role in SCC development and progression through various mechanisms. It promotes epithelial-mesenchymal transition (EMT) by activating the AKT signaling pathway38, enhances cell-matrix adhesion via fibronectin-induced α5β1 integrin37, and increases invasiveness through β-catenin stability39. Aberrant activation of this pathway contributes to the invasive, metastatic, and aggressive nature of SCC.

Positive regulation of transcription, DNA-templated process is a fundamental cellular process that involves in gene expression activation and repair. Previous reports indicated that aberrations in this process are associated with the metastasis of several cancers, including lung adenocarcinoma40 and nasopharyngeal carcinoma41. Moreover, the importance of the “regulation of transcription, DNA-templated” process in the progression of oral squamous cancer cells (SCC)42 and esophageal SCC has been reported previously43.

Cellular protein metabolic process is essential for the normal functioning of cells, including growth, differentiation, and repair. However, an aberration of this process has been reported in oral SCC44,45. Interestingly, Proteolysis, a key protein metabolic process in SCC, provides essential amino acids for protein synthesis and energy generation while potentially releasing harmful pro-inflammatory cytokines and apoptosis-resistant proteins46,47.

Actin organization is a fundamental cellular process that involves the assembly, disassembly, and rearrangement of actin filaments. Moreover, actin filament assembly plays a crucial role in cell motility, adhesion, and division. Therefore, dysregulation of actin filament organization can promote several mechanisms that contribute to SCC progression, including epithelial-mesenchymal transition (EMT), cell division48, cell migration49, cell invasion50 and apoptosis resistance51.

Hemopoiesis is a tightly regulated process that forms the blood’s cellular components52. There is growing evidence that hemopoiesis may play a role in SCC development. A previous study has suggested that MEIS1 (Myeloid ecotropic viral integration site 1), as a hematopoiesis-associated transcription factor, promotes the expression of stem cell markers in esophageal SCC53. Moreover, several studies have indicated patients who underwent hematopoietic stem cell transplantation have a high baseline risk of HNSCC54,55,56.

Phosphate metabolic processes can be involved in carcinogenesis through the regulation of cell proliferation, cell migration and energy production57,58,59. A previous study has suggested inhibition of EGFR, over-expressed in cutaneous SCC, can suppress genes associated with “regulation of phosphate metabolic process”60. Moreover, another study has indicated that genes in this biological process are targets of Hsa-miR-181a, a critical miRNA in SCC pathogenesis61.

Cilium, resembling an antenna protruding from the cell surface, plays a critical role in the significant transduction of cellular signaling cascades, including cell proliferation, differentiation, and migration. Growing evidence suggests that defects in the cilium can result in a spectrum of human diseases known as ciliopathies, and ciliary deregulation also contributes significantly to tumor formation and progression. Remarkably, restoring the integrity of cilia can inhibit cancer cell proliferation in some cases62,63,64. Furthermore, previous studies have highlighted the crucial role of “cilium assembly” in the pathogenesis of HNSCC65,66.

Gonad development is a complex process involving transforming primordial germ cells into mature gametes. Although this process is seemingly unrelated to carcinogenesis, the enrichment of “gonad development processes” in HNSCC-related genes indicates that proteins with multiple molecular roles can intertwine different biological processes. Genes such as LHX267, MMP768 and E-cadherin69,70, involved in gonad development, also play crucial roles in SCC pathogenesis.

Dynamic co-expression relationships in the Gins2/{Akt2, Anxa2} Triplet

The Gins2/{Akt2, Anxa2} triplet was chosen for an in-depth discussion due to its distinctive attributes compared to other triplets. First, this triplet is not only statistically significant but also biologically relevant, as evidenced by both GSEA and GRN analyses. Second, dysregulation of GINS2 and AKT2 in metastatic patients compared to non-metastatic ones was determined at the protein level, according to in-silico validation.

Our results demonstrate a dynamic co-expression relationship between Akt2 and Anxa2, modulated by changes in Gins2 expression levels. Specifically, when the normalized expression level of Gins2 ranges between − 2.38 and − 0.42, as commonly observed in non-metastasizing (NM) samples, there is an inverse correlation between Akt2 and Anxa2 expression levels (r_low = − 0.48). Conversely, in the normalized expression range of 0.42 to 2.38, typical of metastasizing (M) samples, a direct correlation exists between Akt2 and Anxa2 expression levels (r_high = 0.42), as illustrated in Fig. 5.

Furthermore, the regulatory relationship between Gins2 and the other two genes in the Gins2/{Akt2, Anxa2} triplet, Akt2 and Anxa2, is evident in the GRN. This relationship between Gins2 and Akt2 was mediated by two genes, namely Emcn and CD177. Similarly, the relationship between Gins2 and Anxa2 was mediated by three genes, namely Fbxo2, Crip2, and Anxa2p2(Fig. 3).

In the following, the relationships among Akt2, Anxa2 and Gins2 have been discussed using the literature.

The Anxa2 and Akt2 genes are intimately associated, promoting cellular signaling pathways that govern diverse biological processes, including epithelial-mesenchymal transition71, angiogenesis72, proliferation, apoptosis, migration and invasion73,74. Specifically, Anxa2 regulates PI3K/AKT signaling cascade in various carcinogenic and non-cancerous diseases, such as osteosarcoma75, lung cancer76, colorectal cancer77, breast cancer71, retinal neovascularization78 and ischemic stroke72. On the other hand, the evidence has indicated that inhibiting the Anxa2 gene disrupts the AKT signaling pathway, consequently inhibiting cell proliferation, migration, and invasion while promoting apoptosis75,79.

Based on our results, the Gins2 gene acts as the switch gene for the {Akt2, Anxa2} gene pair in the context of HNSCC. Recent research highlights that Gins2 facilitates cancer development via the PI3K/AKT/mTOR pathway in various cancer types80,81. This evidence underscores the significant relationship between Gins2 and Akt2, reinforcing the crucial role of Gins2 in modulating the interactions between Akt2 and Anxa2.

The role of Gins2, Akt2, and Anxa2 in HNSCC

In the following sections, we discussed the role of three genes involved in the Gins2/{Akt2, Anxa2} triplet in the pathogenesis of HNSCC based on a review of the literature.

Go-Ichi-Ni-San complex subunit 2 (GINS2) is a critical component of the GINS complex, essential for DNA replication and cell cycle progression. Such a gene contributes to cancer progression by promoting tumor cell proliferation and migration, inhibiting apoptosis, and impeding cell cycle arrest. Notably, Gins2 overexpression has been observed at both mRNA and protein levels in various aggressive human tumors82,83,84. Its specific role in SCC is less frequently studied, but a meta-analysis study has reported the expression levels of Gins2 are co-related with poor disease-free survival in oral SCC85. Additionally, another study has suggested that Gins2 is associated with immune microenvironment and immune infiltration in lung SCC86.

Protein kinase B (Akt2) is one of the three isoforms of Protein Kinase B, a serine/threonine kinase pivotal in cellular signaling. Akt is activated by diverse stimuli such as hormones, growth factors, cytokines, and integrins, leading to various cellular processes, including proliferation, protein synthesis, autophagy, and cell survival. In HNSCC, Akt activation promotes cell migration and invasion through the regulation of EMT and cytoskeletal remodeling87. Furthermore, the overexpression of the Akt2 gene has been reported in several SCC-related studies, both at mRNA and protein levels88,89. Moreover, polymorphisms in the Akt2 gene suggested contributing SCC susceptibility90,91,92.

ANXA2 is a calcium-dependent phospholipid-binding protein involved in several cellular processes, including proliferation, migration, autophagy, EMT and invasion. Recognized as a tumor-associated protein, ANXA2 is often abnormally expressed in various cancers93, making it a potential therapeutic target94. The evidence shows that ANXA2 is involved in cell migration by inhibiting the EMT via the Twist/Snail pathway, leading to morphological changes and the dissolution of adhesive junctions93. Moreover, previous studies have emphasized the up-regulation of Anxa2 in SCC both at mRNA and protein levels and its critical role in the migration and invasion capabilities of cancer cells95,96,97,98.

Role of Gins2, Akt2 and Anxa2 genes in PI3K/AKT/mTOR (PAM) signaling pathway

The phosphatidylinositol 3-kinase/Akt/mammalian target of the rapamycin (PAM) signaling axis plays a pivotal role in various cellular processes, including cell growth and survival. Dysregulation of this pathway is implicated in EMT and metastasis via its influence on cell migration99,100. Aberrations in the PAM signaling pathway frequently occur in approximately 50% of tumors. Given its pro-oncogenic role, the PAM pathway has been considered a potential target for drug development101,102,103,104.

The following discusses the role of Gins2, Akt2 and Anxa2 genes in the PAM signaling pathway.

Akt2 and the PAM Pathway: The AKT2 gene is localized in the plasma membrane. It is intertwined with a range of signaling paths and is one of the major functional proteins in the PAM signaling pathway. Upon activation by PI3K, Akt2 is phosphorylated and translocates from the plasma membrane to the cytoplasm and nucleus, where it encounters numerous substrates. This PI3K-mediated Akt activation leads to many downstream effects, including the activation of mTOR signaling105. Specifically, Akt2 directly phosphorylates mTOR at Serine 2448, a critical step in the pathway106. It should be noted that mutation and overexpression of the Akt2 gene are prevalent in various cancers, including colorectal cancer101, liver cancer102, breast cancer103, neuroblastoma cancer107 and Biliary tract cancer108.

Anxa2 and the PAM Pathway: The Anxa2 gene can be linked to the PAM signaling pathway in various ways. In ovarian cancer cells, Anxa2 promotes mesothelial-mesenchymal transition (MMT), enhancing migration and invasion through this pathway109. Additionally, miR-342 targets Anxa2, activating the PAM pathway and promoting a malignant phenotype in endometrial stromal cells110. Anxa2 also induces EMT and increases migratory capabilities in lung cancer via the PAM signaling pathway111.

Gins2 and the PAM Pathway: Recent studies have highlighted the role of Gins2 in regulating the PAM pathway, influencing proliferation, migration, and metastasis112. The Gins2 gene is up-regulated in various carcinomas, and its knockdown has been shown to suppress the PAM pathway. Moreover, PI3K inhibition can mitigate the effects of Gins2 up-regulation, underscoring its pivotal role in the PAM signaling axis80,113.

The above evidence suggests that the Gins2/{Akt2, Anxa2} triplet may be involved in MAP signaling pathway. However, further genetic studies are essential to fully elucidate the detailed molecular mechanisms and explore potential therapeutic targets within this pathway.

Gins2 and survival: While our study identified Gins2 as a gene implicated in the metastatic nature of HNSCC, our survival analysis did not reveal a statistically significant correlation between Gins2 expression levels and patient survival outcomes. This lack of significance may arise from the complexity of survival determinants in HNSCC, including tumor heterogeneity, the involvement of redundant pathways, and the multifactorial nature of cancer prognosis114. Furthermore, the role of Gins2 gene may be more nuanced, affecting disease progression rather than overall survival duration. Further investigations are needed to elucidate Gins2’s function and interactions within the tumor microenvironment to understand its implications for patient outcomes better.

Conclusion and further work

Recent advances in generating disease-related “omics” datasets have opened valuable research avenues for exploring disease pathways and associated genes. In this study, we employed a three-way interaction approach for the first time to identify critical biomarkers and disrupted biological pathways involved in the metastatic nature of HNSCC. This method can cope with the dynamic nature of co-expression relationships by introducing a switch gene as a surrogate for the intrinsic state variable of cells. Consequently, this approach offers a more detailed and accurate understanding of the cellular alterations underlying the disease. Furthermore, switch genes, as regulators of gene interaction dynamics, present promising therapeutic targets. Our study results highlighted the critical role of the Gins2/{Akt2, Anxa2} triplet in the metastasis of HNSCC at mRNA and protein levels. Indeed, the Gins2, as a switch gene, together with the gene pair {Akt2, Anxa2} form a statistically significant besides biologically relevant triplet, potentially serving as key players in the PI3K/AKT/mTOR signaling pathway. Additionally, survival analysis highlighted C19orf33 and Usp13 as genes with significant prognostic value.

Although our study provided new insights into the nature of HNSCC through computational approaches, further efforts are required to validate these findings. A reasonable approach for in-silico validation of such results is to verify them using additional gene expression datasets. However, several essential prerequisites must be taken into account when selecting a reliable dataset. The most critical prerequisite is an adequate sample size. The LA algorithm is based on correlation coefficients, and the samples must be divided into at least three bins during the LA analysis procedure. Since the statistical significance of the correlation coefficient is related to the sample size, this parameter should be taken into account when selecting a suitable dataset. Another significant prerequisite is the association among the samples in the datasets, which involves two approaches: (i) features, which relate to the design similarity of the corresponding studies, and (ii) gene expression profiles, which can be affected by variations across the platforms used to generate the data.

To conduct in-silico validation, we explored three well-known omics databases: ArrayExpress115, Gene Expression Omnibus (GEO)116, and The Cancer Genome Atlas (TCGA)117, in search of transcriptomics datasets related to HNSCC. Unfortunately, datasets pertaining to the nature of HNSCC metastasis are scarce in publicly available databases. This scarcity highlights the urgent need for further efforts to collect such data, which would enhance our understanding of this disease.

To validate our in-silico findings, we propose the following experimental framework as a next step:

  1. 1.

    Gene Silencing: Gins2, identified as a switch gene for the Akt2 and Anxa2 gene pair, should be silenced or knocked down in an HNSCC-relevant cell line using siRNA, shRNA, or CRISPR-Cas9 technologies.

  2. 2.

    Expression Analysis: The expression levels of Akt2 and Anxa2 should be evaluated both before and after Gins2 silencing. This can be accomplished using quantitative PCR (qPCR) for mRNA analysis and/or Western blotting for protein level detection.

  3. 3.

    Correlation Assessment: Changes in the expression correlation between Akt2 and Anxa2 following Gins2 silencing should be analyzed to evaluate their LA.

  4. 4.

    Functional Analysis: To assess the role of the Gins2/{AKT2, ANXA2} axis in promoting metastasis, cellular motility should be evaluated. Migration assays, such as soft agar colony formation, wound healing, or transwell migration/invasion assays, should be conducted before and after silencing of these genes by prioritizing Gins2.

Materials and methods

Gene expression profiling dataset

The selected dataset includes gene expression data from 48 metastasizing (M) and 41 non-metastasizing (NM) human HNSCC patient samples (with no hormone secretion). This data is available in the ArrayExpress database115 under accession number E-TABM-1328118. The data were generated using Affymetrix HG-U133 Plus 2.0 GeneChip arrays.

The microarray data were normalized within- and between-arrays using Robust Multi-array Average (RMA)119 and quantile normalization120 methods, respectively. These methods were implemented in the Affy R package121. Duplicate probes were removed using “genefilter” R package122, retaining the probe with the highest interquartile range (IQR) of gene expression levels. Moreover, genes that showed no significant changes were removed from the dataset using the empirical Bayes method, considering a p-value threshold of < 0.05123.

Liquid association analysis

To explore dynamic changes in gene co-expression patterns, we computed the liquid association measure for all gene triplets in the main dataset using the fastMLA function in “fastMLA” R package33. This function employs a modified liquid association algorithm to compute an MLA (Modified Liquid Association) score for each gene triplet, providing insight into the strength of the dynamic correlation between pairs of genes, X and Y, following alterations in a third gene, Z.

In detail, MLA (Z/{X, Y}) can be estimated using the following formula:

$$\:\widehat{MLA}=\frac{{\sum\:}_{i}^{M}\widehat{{\rho\:}_{i}}\:\stackrel{-}{{Z}_{i}}}{M}$$

More specifically, the estimation of MLA (X, Y|Z) involves several parameters: M, which represents the number of bins over Z; ri, the Pearson’s correlation coefficient between X and Y within samples of the ith bin; and Zi, the mean expression values of Z within the ith bin.

  1. 1.

    It is essential to note that prior to running the fastMLA analysis, two preprocessing steps must be undertaken:

  2. 2.

    The marginal distribution of each variable should be normalized to minimize potential outliers. This normalization is achieved through a normal quantile transformation, as described in Li’s approach7.

  3. 3.

    Each variable should be standardized to have a mean of 0 and a variance of 1.

    The first preprocessing step was carried out using an in-house implementation, while the second was accomplished using the CTT package124. Additionally, the Bonferroni correction method125 was used to estimate the False Discovery Rate (FDR), and liquid association triplets with an FDR less than 0.1 were considered statistically significant.

Gene set enrichment analysis

Gene Set Enrichment Analysis (GSEA) is employed to identify biologically relevant triplets and to discern the central pathways and biological processes associated with HNSCC. GSEA is a statistical method used to identify clusters of genes or proteins that are overrepresented in a specific dataset based on predefined annotations126,127. This method is instrumental in elucidating the biological significance of large data sets.

In this study, GSEA was conducted on all genes present in statistically significant triplets. The analysis focused on biological processes and pathways using the Gene Ontology (GO) database43 and the Kyoto Encyclopedia of Genes and Genomes (KEGG) database128, respectively.

For these analyses, the ClueGO tool129, with a Kappa threshold of 0.4, was employed within the Cytoscape v.3.3.0 environment130. Additionally, the enrichment analysis results were validated using the right-sided hypergeometric test coupled with the Benjamini-Hochberg correction method for controlling the false discovery rate131.

Gene regulatory network construction

A Gene Regulatory Network (GRN) is a model that represents the complex regulatory mechanisms governing the mRNA expression within cells, ultimately influencing cellular function. This network consists of nodes (representing genes) and edges (depicting regulatory relationships), which help predict changes in gene expression under varying conditions.

In our study, we employed ARACNE (Algorithm for the Reconstruction of Accurate Cellular Networks)132 to construct the GRN. ARACNE is a reverse engineering approach designed to build cellular networks based on gene expression data. This algorithm identifies directed regulatory interactions between each transcriptional regulator and its potential target genes using mutual information. ARACNE was executed within the geWorkbench_2.6.0 framework133 for all genes involved in statistically significant triplets, applying a significance threshold of p-value less than 0.05.

In-silico validation of central genes

The expression of key central genes was assessed at the protein level using clinical proteomic data available through the UALCAN portal134. The portal integrates information from The Cancer Proteome Atlas (TCPA), which contains protein expression data generated via Reverse Phase Protein Array (RPPA). This dataset provided insights into protein abundance across various cancer types and clinical conditions.

Survival analysis

Survival curves were generated using the Kaplan-Meier method, and the log-rank test was employed to evaluate the significance of differences between survival curves. A two-sided p-value of less than 0.01 was considered statistically significant. All statistical analyses were carried out using “survival”135 and “survminer”136 R packages.