Introduction

Graph-structured data maps out intricate relations between various entities around the world, from the vast expanses of social networks1 to the dense construction of knowledge graphs2, and the intricate patterns of molecular structures3 even to 3D topologies of manifolds4. This data structure plays an essential part in complex relationship modeling. Graph Neural Networks (GNNs) and their variants are efficient tools for exploring graph-structured data, utilizing node features and graph structure to address challenges in network analysis. This capability makes GNNs widely applicable across various domains, including deciphering molecular structures5, navigating social networks6, formulating product suggestions7, or dissecting software programs8.

Convolution techniques in computer vision9,10 have been applied to graph-structured data, promoting advancements in GNNs. Based on different convolution definitions, GNNs are divided into two categories: spectral-domain11 and spatial-domain12,13,14. Spectral-domain GNNs define graph convolution through the lens of graph signal processing, based on the principle that convolving two signals in the space domain is equivalent to multiplying their Fourier transforms in the frequency domain. This concept originates from Bruna’s work11, with subsequent advancements and refinements made by notable works on ChebNet15, CayleyNet16, and GCN17. Spatial-domain GNNs perform convolution on the representations of each node and its neighbors directly to update states, and exhibit a wide variety of variants according to different neighboring information aggregation and integration strategies. Particularly, Graph Attention Network (GAT)18stands out owing to its attention-based neighborhood aggregation. This architecture enables nodes to weigh the significance of neighboring information during their feature update process. Building upon this, GAT219 introduces dynamic attention, demonstrating more robust and expressive capabilities.

While these methods make use of basic topological information, such as node degrees or edges, during message passing, they do not explicitly incorporate richer topological features. This limitation prevents GNNs from fully leveraging the inherent properties of the graph structure, which are crucial for understanding graph-structured data. For instance, in social networks20, the topological structure can reveal community patterns, influential entities, and the dynamics of information flow. In chemical informatics21, the molecular topology directly influences the chemical properties and reactivity of molecules. In biological networks22, analyzing topological differences helps understand cellular functions and disease mechanism. To address this limitation, some GNNs23,24,25 leverage the topological infomation, by adjusting factors like message passing weights or choosing specific nodes for information propagation. You’s and Tian’s work26,27 attempts to enhance node expressiveness by concatenating the extracted topological information with node representation. However, node representations and topology representations are essentially two different modalities. Wang’s and Baltrušaitis’s work28,29 indicates that simply concatenating data from different modalities, while ignoring the interactions between these modalities, may hinder the network from effectively learning useful information from each modality.

Motivated by the above issues, we propose Graph Topology Attention Networks (GTAT) to address the inadequate utilization of topological information and the limitation of unimodal configuration. In specific, GTAT starts by extracting topology features from the graph’s structure, and then encodes them into topology representations. We take the infuluence of the node local topology into account by encoding the topology information as another input into the model. Then, we compute two types of attention scores and use cross attention mechanism to process both the node representations and the extracted topology features. This integration enables topology features to be incorporated into node representations and ensures the relationships in graph effectively captured, achieving a more robust and expressive graph model.

The contributions of this paper are summarized as follows:

  • We propose a novel graph neural network framework, GTAT, which enhances the utilization of topological information for processing graph-structured data. In this framework, we treat node feature representations and extracted topology representations as two separate modalities, which are then inputted into the GNN layers.

  • We explore the feasibility of applying the cross attention mechanism in GNNs. Our approach calculates attention scores for both node feature representations and node topology representations, then employ a cross attention mechanism to integrate these two sets of representations. This integration allows the model to dynamically adjust the influence of node features and topological information, enhancing the representation capability.

  • Experimental results on nine diverse datasets demonstrate our model has a better performance than state-of-the-art models on classification tasks. Further analysis involving variations in model depth and noise levels reveals GTAT’s capability to mitigate the over-smoothing issue, and its increased robustness against noisy data. These results highlight that GTAT can be used as a general architecture and applied to different scenarios.

Related work

Graph neural networks

Different GNNs employ various aggregation schemes for a node to aggregate messages from its neighbors. GCN17 utilizes a layer-wise propagation technique, employing a localized first-order approximation of spectral graph convolutions to encode representations. SAGE30 learns a function to generate embeddings from a node’s local neighborhood, enabling predictions on previously unseen data. SGC31 simplifies the training process by reducing the number of non-linear layers and merges multiple layers of graph convolution into a single linear transformation. FAGCN32 optimizes neighborhood information aggregation by analyzing the spectral properties of graphs, employing different strategies for handling high-frequency and low-frequency signals. Attention mechanism33 empowers GATs to selectively focus on significant neighborhood information while updating node representations, thus pioneering a new approach in graph representation learning. GAT18 employs a self-attention mechanism, which calculates attention coefficients for each neighbor of a node and utilizes them to weight corresponding neighbor features during aggregation, permitting the GAT to assign more considerable weights to more relevant neighbors. GAT219 employs a dynamic attention mechanism to enhance the model’s expressive abilities, accommodating scenarios where different keys possess varying degrees of relevance to different queries.

GNNs with topology

Leveraging graph topology has become more and more popular in graph representation learning. mGCMN34 incorporates motif-induced adjacency matrices into its message passing framework, adjusting weights to capture complex neighborhood structures. TAGCN35 slides a set of fixed-size learnable filters over the graph, where each filter adapts to the local topology. P-GNNs26 samples multiple sets of anchor nodes and applies a distance-weighted aggregation scheme to differentiate nodes’ positions information. SubGNN36 learns disentangled representations of subgraphs by using routing mechanism to handle subgraph internal topology, position, and connectivity, enhancing performance on subgraph prediction tasks. To learn deep embeddings on the high-order graph-structured data, Hyper-Conv37. extends traditional graphs, permitting edges to connect to any number of vertices, thus altering the aggregation methods among nodes. Given the importance of topological information, we extract and encode it to enhance model’s representation ability.

Cross attention mechanism

The concept of the cross attention mechanism was first proposed in the Transformer model38. Cross attention mechanism bridges two distinct sequences from diverse modalities such as text, sound, or images. Cross attention provides a flexible framework that allows for interactions between different modalities39,40, enhancing mutual understanding. Exploiting this concept, the Perceiver model41 processes input byte arrays by alternating between cross attention and latent self-attention blocks. Meta’s Segment Anything Model42 leverages cross attention to connect the prompts and image information, fostering enhanced interactions and richer embeddings. MMCA43 uses cross attention module to generate cross attention maps for each pair of class feature and query sample feature, making the extracted feature more discriminative. Recently, some works44,45 have also adopted cross-attention mechanisms in graph-related tasks. However, most of these studies focus on using cross-attention to facilitate interactions between graph modules and non-graph modules. In this study, we employ the cross-attention mechanism to enable modality interaction within the graph module itself, without requiring the assistance of non-GNN modules. This distinction allows for more efficient and intrinsic interactions within the graph structure itself.

Method

Framework

As illustrated in Figure 1, our framework begins with the topology feature extraction (TFE) for each node. After getting the set of topology representations, we apply Graph Cross Attention (GCA) layers to update node feature representations and topology representations. Lastly, the model utilizes the node feature representations from the final layer to predict node classifications. Our methodology presents an innovative fusion of original feature representations and the topology representations, utilizing a unique cross attention mechanism on graph to enhance the expressive capabilities of each node. The following sections comprehensively elaborate on our approach.

Fig. 1
figure 1

GTAT framework. Given a graph \(\mathcal {G}\) with \(N\) nodes, along with a set of node feature representations H, we first obtain the GDV of these nodes through the TFE. Subsequently, we use MLP transforms GDV into a set of topology representations T. GTAT layer receives \(\mathcal {G}\) and these two representations as input, then transforms and outputs two updated sets of representations. Finally, based on the set of node feature representations, our model outputs the predictions of nodes’ classifications.

Topology feature extraction

To extract the information inherent in graph, we obtain the topology representations based on the graphlet degree vector (GDV)22,46 for each node. GDV is a count vector that represents the distribution of nodes in specific orbits of graphlets. Graphlets, defined as small connected non-isomorphic induced subgraphs within a graph, succinctly capture the neighboring structure of each node in the network. And an orbit can be thought of as a unique position or role a node can have within a graphlet. For instance, each node in a triangle (a three node graphlet) has the same role, so they belong to the same orbit. GDV is a vector to count the participation times of different orbits across the local distinct graphlets. The GDV delivers a measure of the node’s local network topology feature, enhancing model’s understanding of the graph structure.

Figure 2 shows all four different orbits with up to three nodes and the GDV calculating of node \(\nu\). In fact, there are 15 distinct orbit types for graphlets with up to four nodes, and 73 types for graphlets with up to five nodes. We utilize the Orbit Counting Algorithm (OCRA)47 to compute the GDV of nodes within a network. OCRA offers a combinatorial method for the enumeration of graphlets and orbit signatures of network nodes, reducing the computational complexity encountered in the counting of graphlets. The time complexities for computing the GDVs of these two dimensions are respectively \(O\left( n \cdot d^3\right)\) and \(O\left( n \cdot d^4\right)\), where n is the number of nodes and d is the maximum degree of the nodes.

Fig. 2
figure 2

Up: Four orbits with different color. Down: The computation of GDV for node \(\nu\) in graph \(\mathcal {G}\). This diagram illustrates all instances where node \(\nu\) appears in four distinct orbits. Correspondingly, the GDV for \(\nu\) is [2, 1, 0, 1], reflecting the appearance count of \(\nu\) in these orbits.

Building on the aforementioned approach, this study employs the GDV as the extracted node topology feature. The dimensionality of each node’s GDV corresponds to the number of orbits, representing its topological characteristics. These GDVs, after being normalized and processed through a multilayer perceptron (MLP)48, serve as the topology representations inputted into the network. To balance the computational efficiency and prediction accuracy, we employ the 73-dimensional GDV. The comparative experiments are showed in Section 4.

Graph cross attention layer

After obtaining the topology representation, our approach introduces the computation of two types of attention: the feature attention and a novel topology attention, thereby implementing a cross attention mechanism on graph. The structure of GCA layer is depicted in Figure 3.

Our GCA layer receives a set of node feature representations, \({H}_l=\left\{ {h}_1, {h}_2, \ldots , {h}_N\right\}\), and a set of topology representations, \({T}_l=\left\{ {t}_1, {t}_2, \ldots , {t}_N\right\}\), where N is the number of nodes at layer l. Following the methodology in GAT, we calculate the feature attention score between feature representations of nodes and their corresponding neighbors:

$$\begin{aligned} e\left( {h}_i, {h}_j\right) =\operatorname {LeakyReLU}\left( {a}^{\top } \cdot \left[ {W} {h}_{{i}} \Vert {W} {h}_j\right] \right) \end{aligned}$$
(1)

where \({h}_i\) and \({h}_j\) are the feature representations of nodes \(i\) and \(j\), while W and a represent a weight matrix and a shared parameter vector, respectively. This calculation embodies the inherent attributes of the nodes and assigns more considerable weights to more relevant neighbors.

Fig. 3
figure 3

The structure of GCA layer. Inputs is a set of node feature representations, \({H}_l \in \mathbb {R}^{N\times F_1}\) and a set of node topology representations, \({T}_l \in \mathbb {R}^{N\times F_2}\), where N is the number of nodes at layer l. After computing two attention matrices, denoted as \(\alpha\) and \(\beta\), we employ message passing(M.P.) mechanism to get the new representations \({H}_{l+1} \in \mathbb {R}^{N\times F_3}\) and \({T}_{l+1} \in \mathbb {R}^{N\times F_2}\).

Furthermore, we introduce a new form of attention score, topology attention score. This score is calculated between topology representations of nodes and their corresponding neighbors:

$$\begin{aligned} {e}_t\left( {t}_i, {t}_j\right) = \operatorname {LeakyReLU}\left( {a}_t^{\top } \cdot \left[ {t}_i \Vert {t}_j\right] \right) \end{aligned}$$
(2)

where \({t}_i\) and \({t}_j\) are the topology representations of node \(i\) and node \(j\), with \({a}_t\) being a shared parameter vector. Then the feature attention scores and topology attention scores are normalized as :

$$\begin{aligned} \alpha _{i j}=\operatorname {softmax}_j\left( e\left( {h}_i, {h}_j\right) \right) =\frac{\exp \left( e\left( {h}_i, {h}_j\right) \right) }{\sum _{j^{\prime } \in \mathcal {N}_i} \exp \left( e\left( {h}_i, {h}_{j^{\prime }}\right) \right) } \end{aligned}$$
(3)

and

$$\begin{aligned} \beta _{i j}=\operatorname {softmax}_j\left( e_{t}\left( {t}_i, {t}_j\right) \right) =\frac{\exp \left( e_{t}\left( {t}_i, {t}_j\right) \right) }{\sum _{j^{\prime } \in \mathcal {N}_i} \exp \left( e_{t}\left( {t}_i, {t}_{j^{\prime }} \right) \right) } \end{aligned}$$
(4)

where \(\alpha _{i j}\) is the feature attention coefficient between node \(i\) and node \(j\), and \(\beta _{i j}\) serves as the topology attention coefficient, enabling the model to capture the local substructure of each node in the network. Additionally, \(\mathcal {N}_{i}\) represents the set of neighbors of node i, and it can be defined as follows:

$$\begin{aligned} \mathcal {N}_i=\{j \in \mathcal {V} \mid (j, i) \in \mathcal {E}\} \end{aligned}$$
(5)

where \(\mathcal {V}\) represents the set of nodes in the graph, and \(\mathcal {E}\) represents the set of edges.

Following the two attention computations, we implement the cross attention mechanism, which intertwines the node feature representations and the topology representations. The node feature representation is updated with the computed topology attention coefficients as:

$$\begin{aligned} {h}_i^{\prime }=\sigma \left( \sum _{j \in \mathcal {N}_i} \beta _{i j} {W} {h}_j\right) \end{aligned}$$
(6)

where \(\sigma\) is a nonlinearity and \({W} \in \mathbb {R}^{F_3\times F_1}\) represent a weight matrix. Simultaneously, the topology representation is updated with the calculated feature attention coefficients :

$$\begin{aligned} {t}_i^{\prime }=\sigma \left( \sum _{j \in \mathcal {N}_i} \alpha _{i j} {t}_j\right) \end{aligned}$$
(7)

Finally, the layer outputs a new set of node feature representations, \({H}_{l+1}=\left\{ {h}_1^{\prime }, {h}_2^{\prime }, \ldots , {h}_N^{\prime }\right\}\), and a set of topology representations, \({T}_{l+1}=\left\{ {t}_1^{\prime }, {t}_2^{\prime }, \ldots , {t}_N^{\prime }\right\}\).

It’s worth mentioning that dynamic attention mechanism, which is introduced in GAT2, also performs well across various tasks. The dynamic attention in GAT2 diverges from GAT’s static counterpart by adjusting its weights based on the query, thus accommodating scenarios where different keys possess varying degrees of relevance to different queries. The dynamic attention calculation in GAT2 is formulated as follows:

$$\begin{aligned} e\left( {h}_i, {h}_j\right) ={a}^{\top } \operatorname {LeakyReLU}\left( \left[ {W}{h}_i \Vert {W}{h}_j\right] \right) \end{aligned}$$
(8)

To equip our model with dynamic attention, we further propose another version: GTAT2. In GTAT2, we employ the dynamic attention mechanism as utilized in GAT2 for the computation of two attention scores, as Equation 8 and Equation 9:

$$\begin{aligned} {e}_t\left( {t}_i, {t}_j\right) ={a}_t^{\top } \operatorname {LeakyReLU}\left( \left[ {t}_i \Vert {t}_j\right] \right) \end{aligned}$$
(9)

Both the node feature and topology representations in GTAT2 are updated similarly to those in GTAT. Experiments and analysis on GTAT and GTAT2 are conducted subsequently.

The cross action of the node and topology representations allows for the capture of both node intrinsic attributes and topological relations, thereby significantly augmenting the prediction accuracy of our model.

Experiments

Datasets

In our experiments, we use nine commonly used benchmark datasets, namely three citation networks datasets (i.e., Cora, Citeseer, and PubMed)49, two Amazon sale datasets (i.e., Computers and Photo)50, two coauthorship datasets (i.e., Physics and CS), one Wikipedia-based dataset (i.e., WikiCS)51, and one arxiv papers dataset (i.e., Arxiv)52. Statistics for all datasets can be found in Table 1. The resources we used are all from the PyTorch Geometric Library53.

Table 1 The statistics of datasets.

Experimental setup

All experiments are implemented in PyTorch and conducted on a server with two NVIDIA GeForce 4090 (24 GB memory each). We conduct 20 runs, reporting the mean values alongside the standard deviation. The search space for hyper-parameters encompasses: hidden size options of \({\left\{ 8, 16, 32, 64 \right\} }\), learning rate choices of \({\left\{ 0.01, 0.005 \right\} }\), dropout values of \({\left\{ 0.4, 0.6 \right\} }\), weight decay options of \({\left\{ 1E-3, 5E-4 \right\} }\), and selection of attention heads from \({\left\{ 1, 2, 4, 8 \right\} }\)for models using attention mechanism. We hold the number of layers constant at 2. All methods utilize an early stopping strategy54 based on validation loss, with patience of 100, and all are trained using a full-batch approach. In all cases, we randomly select 20 and 30 nodes per class for the training and validation, respectively, and the remaining nodes are used for testing. We use NLL Loss as the loss function for the model:

$$\begin{aligned} \mathcal {L}=-\frac{1}{N} \sum _{i=1}^N \sum _{c=1}^C y_{i, c} \log \left( \hat{y}_{i, c}\right) \end{aligned}$$
(10)

where C is the number of classes in the classification task, \(\hat{y}_{i, c}\) is the predicted probability of sample \(i\) being classified into class \(c\), and \(y_{i, c}\)is the ground truth label. We utilize the Adam optimizer55 to minimize the loss function and optimize the parameters of these models.

Node classification results

The comparative methods in our study involve nine different algorithms: GCN17, GraphSAGE (SAGE)30, SGC31, FAGCN32, GAT18, GAT219, Hyper-Conv37, mGCMN34 and Dir-GNN56.

Table 2 shows the average accuracy and standard deviation of different models. Except for two datasets, GTAT or GTAT2 achieves the best results in all other datasets. Compared to GATs, GTATs show better performance across all datasets due to the extracted topology features and the cross attention mechanism. Specifically, GTAT achieves an average accuracy improvement of 0.53% across nine datasets compared to GAT, and GTAT2 outperforms GAT2 with an accuracy improvement of 0.48%. Compared to Hyper-Conv. and mGCMN, which utilize topological information, our model also demonstrates better accuracy. While Hyper-Conv. and mGCMN merely adjust the message-passing pathways or weights based on the extracted topological structure, our method receives the extracted topology features as an additional modality. This mechanism enables GTATs to fit the impact of the topological structure on node representation, contributing to more accurate and reliable predictions. Compared to the earlier proposed SGC, GCN, and SAGE models, the GTATs exhibit superior performance.

FAGCN’s effectiveness in the Physics and Cs datasets, where the node features have high dimensions, can be attributed to its adaptive integration of low-frequency and high-frequency signals from the raw features. However, GTATs outperform FAGCN across all other seven datasets. Particularly for the Arxiv dataset, which has low node feature dimensions, GTAT outperforms FAGCN by 4.25%, highlighting GTATs’ capability to achieve higher accuracy with limited node features.

In summary, our GTAT models demonstrate outstanding performance across all nine datasets spanning four distinct data types, showcasing their broad applicability in handling diverse graph-structured data.

Table 2 Accuracy(%) comparison with different models on nine datasets.

Effectiveness of cross attention

To further explore the impact of the cross attention mechanism embedded in our model, we conduct series of experiments based on GATs with two different configurations. (1) GATs+A, which updates both the node feature representations H and the topology representations T using the topology attention coefficients \(\beta\). (2) GATs+B updates only the node feature representations H based on the topology attention coefficients \(\beta\), while the topology representations T remain constant. As shown in Table 3, our method presents the best performance across the most of datasets except the Computers. These results support the importance of utilizing the potential of both node feature and topology representations through our cross attention mechanism to attain optimal performance.

Table 3 Accuracy(%) comparison with/without cross attention.
Fig. 4
figure 4

Average classification accuracy after ten runs for different model depths.

Fig. 5
figure 5

2D t-SNE plot of Physics dataset.

Fig. 6
figure 6

Accuracy and loss curves on Physics dataset.

Fig. 7
figure 7

Dirichlet energy for different model depths.

Over-smoothing analysis

A critical challenge in GNNs is the over-smoothing issue57, which limits the number of layers that can be effectively stacked. As the number of layers increases, the nodes become less and less distinguishable, making the performance of the model drop sharply.

To verify whether topology representations and cross attention could alleviate the over-smoothing issue, we select four different types of datasets and compare the performance of GTATs and GATs at varying depths. There are few significant differences between the models in initial layers, as shown in Figure 4. However, as the depth increases, the GTATs demonstrate more stable performance, avoiding the drastic decline observed in GATs.

Figure 5displays the t-SNE58 plots of the node representations with 20 layers of GAT and GTAT on the Physics dataset. The t-SNE plot provides a visual description of high-dimensional data by projecting them into 2D space, aiding in the identification of relevant patterns. From this visualization, it is evident that GTAT achieve clearer node clustering than GAT. Besides, Figure 6 shows the node classification accuracy curves and loss curves of GATs and our proposed GTATs. It can be seen that GTATs can converge more quickly and stably while achieving better accuracy.

Over-smoothing occurs when node representations become increasingly similar, rendering the model incapable of effectively distinguishing between different nodes. To quantify the similarity between node representations, we selected Dirichlet energy (\(E_D\))59 as our metric:

$$E_D = \frac{1}{n_e} \sum _{i,j} A_{ij} \Vert {h}_i - {h}_j\Vert ^2$$

where \(n_e\) denotes the total number of edges, \({h}_i\) represents the representation of node i, and \(A_{ij}\) is the corresponding element in the adjacency matrix. A higher \(E_D\) indicates greater dissimilarity between node representations. Figure 7 shows that the Dirichlet energy at each layer of the GTATs is exponentially higher than that of the GATs, indicating that GTATs better preserve the distinctiveness of node embeddings even as the depth increases.

GTATs’ better performance at deep layers can be attributed to the topology attention in our model architecture, which establishes the relationships between nodes from the perspective of the topology they inhabit. Topology attention enhances the distinctiveness of node feature representations, thereby improving the expressiveness of the model.

Robustness analysis

Better robustness indicate stronger stability of the model when facing noisy data. To evaluate the robustness of the GTATs, we conduct experiments on four different types of datasets and compare the performance of GTATs and GATs under random feature attack (RFA). RFA19 intentionally corrupts node features in the graph to evaluate each model’s ability when facing the perturbations caused by feature attacks. In particular, the attack is implemented by randomly modifying the nodes features according to a noise ratio \(0 \le p \le 1\). For node \(i\), its representations is modified as follows:

$${h}_i' ={h}_i + p \cdot noise, \quad noise \sim \mathcal {N}(0, 1)$$

where noise is a vector sampled from a Gaussian distribution, \(\mathcal {N}\), with mean zero and variance one.

Figure 8 shows the node classification accuracy on four datasets as a function of the noise ratio p. As p increases, the accuracy of all models decrease asour expection. However, GTATs show a milder degradation in accuracy compared to GATs, which show a steeper descent. The experimental results show that GATs, relying solely on node representations, face difficulty adapting to increased noise levels and suffer more obvious performance declines. GTATs’ resilience to noise can be attributed to the extracted topology presentations and the cross attention mechanism. Both allow GTATs to maintain better differentiation and stability of node features under RFA. These results clearly demonstrate the robustness of GTATs over GATs in noisy settings.

Fig. 8
figure 8

Accuracy in different noise ratio. Each point is an average of 10 runs, error bars show standard deviation.

Efficiency analysis

Similar to other deep learning models, GTAT may need to be deployed on small devices. To compare the scale of the GNN models, we carry out an analysis of the model parameter counts and their performance across three datasets of varying sizes. For a fair comparison, all models in this study adhere to the same hyperparameters: 2 attention heads, an hidden layer of 64 dimensions, a dropout rate of 0.6, a learning rate of 0.01, and a weight decay set to 0.001. As shown in Table 4, it’s clear that GTATs have only a slight increase in parameter counts compared to GATs, yet its performance is notably better. In contrast to GATs, GTATs additionally employ a MLP to convert the GDV into topology representation and \({a}_t\) to calculate topology attention.

Actually, the more orbits there are, the more local topological information a node can obtain. GTATs may benefit from sufficient topology information, but face a heavier computational burden. To understand the influence of orbits with different quantities on model predictions, we conduct experiments across three distinct dataset scales and statistically analyze the time required by the OCRA to compute GDVs of them. In this study, GTATs_4 represent the models that utilize orbits with up to four nodes, and GTATs_5 denote the versions that utilize orbits with up to five nodes. The results in Table 5 show that orbits with up to five nodes, while taking more time to compute than those with four nodes, enhance the accuracy of the predictions. Due to the lack of more efficient algorithm, employing orbits with up to six nodes, while potentially increasing accuracy, would significantly increase the computational time, especially for larger and dense networks. In order to balance computational efficiency with accuracy gains, this paper counts the 73 different orbits with up to five nodes as the nodes’ topology features.

Table 4 Accuracy(%) and parameter counts.
Table 5 Accuracy(%) and orbit counts.

Conclusion

In this paper, we introduce the GTAT, an innovative framework designed to harness the topological potential of graph-structured data. GTAT distinctively merges node and topology features through a cross attention mechanism, enhancing node representations and capturing graph structure information. Experimental results indicate our approach has a better performance than state-of-the-art existing models on classification tasks. Besides, the performance of GTAT with variations in depth and noise suggests that its topology representation combined with cross attention mechanism not only alleviate over-smoothing issue but also enhances the model’s robustness. Future works will focus on refining the GTAT and exploring its potential applications in diverse contexts.