Heterogeneous graph convolutional network for rumor detection with multi-level interactive fusion and graph reconstruction

Liu, Yongping; Wang, Jianliang; Yin, Ming; Zhao, Chunjiang

doi:10.1038/s41598-025-15740-z

Download PDF

Article
Open access
Published: 27 August 2025

Heterogeneous graph convolutional network for rumor detection with multi-level interactive fusion and graph reconstruction

Yongping Liu¹,
Jianliang Wang¹,
Ming Yin¹ &
…
Chunjiang Zhao¹

Scientific Reports volume 15, Article number: 31639 (2025) Cite this article

1616 Accesses
Metrics details

Abstract

Early rumor detection on social media requires joint modeling of semantic content and dynamic propagation patterns, a critical yet challenging task in text mining. While existing methods often focus exclusively on either contextual information or user behavior, we propose MLI-GRA, a heterogeneous graph reconstruction approach that integrates both through multi-level interactive fusion. We first employ a graph auto-encoder framework to integrate semantic information and propagation patterns with the multiple graph convolutional network (GCN) and the graph reconstruction module. Then a multi-feature fusion module with adaptive gated fusion strategy is built to balance semantic and propagation features through multi-task learning.Experiments on real-world Twitter datasets demonstrate the superiority of our approach, achieving state-of-the-art (SOTA) results.

Iterative heterogeneous graph learning for knowledge graph-based recommendation

Article Open access 28 April 2023

Graph-enhanced implicit aspect-level sentiment analysis based on multi-prompt fusion

Article Open access 20 May 2025

A hybrid contrastive graph neural network framework for fake news and hate speech detection using content and user interaction signals

Article Open access 04 December 2025

Introduction

With the rapid development of mobile Internet technology, online social networks are gradually integrated in people’s lives, entertainment and work. Rumor information appears accompanied by the explosive growth of information. Due to a large number of users in social media, rumors bring huge harm to society. How to automatically detect online rumors in the early stages, is of great significance for text mining on social media.

Existing studies on automatic rumor detection mainly focused on designing effective feature extraction approaches with various source information, such as semantic features extracted from the contents^1,2,3, social media propagation features of user profiles^4,5,6 retweet propagation representations^7,8,9 and so on. Recently, many studies have attempted to aggregate multi-source features to enhance rumor detection performance. Shu et al.⁹ developed a combined attention network to learn the interpretability of the comments and its corresponding contents. Ma et al.¹⁰ presented a tree-based recursive neural network to capture the semantic information and propagation clues of source tweet propagations for rumor detection.

Most previous rumor detection methods only focus on some part of contextual information, user profiles or patterns of propagation. Actually, both semantic information and propagation patterns are important for rumor detection on social media, as shown the Fig. 1. There are still several limitations of the above methods. First, most of current detection methods^11,12 only pay much attention on contextual semantic information, however, impact of user communication behavior on social media is rarely considered. Second, many models^13,14,15 attempt to integrate users’ social activities as auxiliary information for rumor detection, such as user comments, retweets, and user personal information, etc., however, the users may tend to simply re-share the source story without leaving any comments¹⁶, which is not considered in most of the current studies. And some studies have been proven that rumors tend to be spread by the user who lacks of personal information^17,18. Therefore, social activities might actually be very useful for rumor detection. How to better represent the textual features in the presence of social activities is one of the major challenges for rumor detection.

In this paper, we propose a multi-level interactive rumor detection approach based on heterogeneous graph reconstruction. We first design an graph autoencoder framework to represent the semantic graph and the user propagation graph, respectively, then a multi-feature interactive fusion strategy is adopted with adaptive gated fusion for rumor detection. The main contributions of our work can be summarized as follows:

A multi-level interactive heterogeneous graph reconstruction approach is presented for rumor detection on social media. Both semantic information and social propagation clues are leveraged to improve rumor detection performance.
An graph auto-encoder framework is designed to obtain the semantic representations and the user propagation representations by reconstructing heterogeneous graphs of social media. The parallel graph convolutional encoder based on GCN and GAT is proposed to represent semantic relations and propagation relations, and a variational graph auto-encoder (VGAE) is proposed for multi-graph reconstruction.
A multi-feature interactive fusion strategy is adopted with adaptive gated fusion to balance the fused global features and local features for rumor detection.
The experimental results show that our proposed approach outperforms all existing approaches, and achieve the state-of-the-art (SOTA) scores on two benchmark datesests. The abalation study and the parameter experiments are given to further show the effectiveness of our proposed model.

Related work

rumor detection

Rumor detection, which is one of the most challenge tasks in text mining field, and has attracted more and more attention over the past decades. Most of the current rumor detection approaches mainly focus on feature fusion strategies based on the textual information and social propagation clues. Early work focused on extracting features manually¹⁹. Some studies apply more effective features, such as user comments²⁰, and the emotional attitude of posts²¹. However, due to the anonymous and noisy characteristics of largescale social media data, over-reliance on manually extracted features is not conducive to rumor detection. Most of the current rumor detection approaches are proposed in the framework of deep neural networks. Ma et al.²² used recurrent neural networks (RNN), gated recurrent unit (GRU) and long short-term memory (LSTM) to learn the text representations for rumor detection. Yu et al. utilized Convolutional Neural Networks (CNN) to capture the high-level interactive features of comments for rumor detection. Social activity is also another important factor for rumor detection, some studies^5,9,23 focus on exploring user social activities. Shu et al.²⁴ built an interactive network to realize fake news by exploring the ternary relationship between publishers, news and user information. Liu et al.²⁵ applied RNN and CNN networks to capture the user characteristics change of along the propagation path.

graph convolutional network

Graph Convolutional Network (GCN)²⁶ is another most popular neural network structures that are widely used for rumor detection. Current approaches have been proposed to detect faker news by exploring the topological strcture of social media^27,28,29. Bian et al.³⁰ leveraged the Bi-directional graph convolutional network to represent user characteristics by operating on top-down and bottom-up directions for rumor detection. Lu et al.³¹ developed a graphaware co-attention to recognize faker news by utilizing both source tweets and their corresponding comments. Liu et al.³² designed a VGAE-based^32,33 model to capture text, dissemination and structural information to enhance rumor detection performance. In order to fully utilize both global and local semantic information, Yuan et al. presented a novel global–local attention network to capture the local and global relationships among all source tweets, retweets, and users. However, previous studies have not fully considered the synthesis of semantic information and user-disseminated information.

Method

We consider two types of additional information: textual semantic representations and user propagation characteristics in rumor detection task and we build a heterogeneous multi-graph based on tweet-word-user relations. $\text{G}=\left(\text{V},\text{E}\right)$ as heterogeneous tweet-word-user graph, the node set $\text{V}=\left(\text{P},\text{W},\text{U}\right)$ constains of source tweets, tweet words and the users the edge set $\text{E}=\left({\text{E}}_{\text{pw}},{\text{E}}_{\text{ww}},{\text{E}}_{\text{pu}}\right)$ contains of three types of realations: tweet-word edges , word-word edges and tweet-user edges. ${\text{E}}_{\text{pw}}$ describes the relationship that the tweet contains the word, ${\text{E}}_{\text{ww}}$ expresses the semantic relation between words, and ${\text{E}}_{\text{pu}}$ reflects the interaction between users and tweets. Where $\text{P}=\left\{{\text{p}}_{1},{\text{p}}_{2},\dots ,{\text{p}}_{\text{m}}\right\}$ is source tweets; $\text{W}=\left\{{\text{w}}_{1},{\text{w}}_{2},\dots ,{\text{w}}_{\text{n}}\right\}$ is tweet words; $\text{U}=\left\{{\text{u}}_{1},{\text{u}}_{2},\dots ,{\text{u}}_{\text{o}}\right\}$ is the users and $\text{m}$ is the numbers of source rumors.

Moreover, each source tweets ${\text{p}}_{\text{i}}$ associated with a ground-truth label ${\text{y}}_{\text{i}}\in \left\{\text{N},\text{F},\text{T},\text{U}\right\}$ (Non-rumor, False Rumor, True Rumor, and Unverified Rumor). Rumor detection aims to learn a classifier $\text{f}:{\text{p}}_{\text{i}}\to {\text{y}}_{\text{i}}$ to predict the label of a tweet based on text contents and user propagation clues.

$${\text{y}}_{\text{i}}=\text{f}\left[{\text{p}}_{\text{i}}|\left(\text{V},\text{E}\right)\right]$$

(1)

where ${p}_{i}$ and ${y}_{i}$ are the sets of source tweets and labels, respectively.

This paper proposes a multi-level interactive rumor detection approach based on heterogeneous graph reconstruction (MLI-HGR), the architecture of our proposed approach is shown in Fig. 2. The MLI-GRA model consists of three parts: (1) the multiple graph convolutional encoder, (2) the multi-graph reconstruction decoder and (3) the multi-feature rumor detector. We will describe each part of our proposed model in detail.

The multiple graph convolutional encoder

Multi-graph construction

We decompose heterogeneous tweet-word-user graph into tweet-word subgraph and tweet-user subgraph to capture the global semantic features based on the text content and user dissemination information.

(1)
Semantic graph construction.

The nodes in the tweet-word subgraph are the tweet and word nodes in the heterogeneous graph. The edges between tweets and words are consistent with the edges on the heterogeneous graph. The nodes in tweet-word subgraph denote as ${X}_{pw}=\left\{{x}_{{p}_{1}},{x}_{{p}_{2}},\dots ,{x}_{{p}_{\left|P\right|}},{x}_{{w}_{1}},{x}_{{w}_{2}},\dots ,{x}_{{w}_{\left|W\right|}}\right\},{x}_{{p}_{i}}\in {X}_{P},{x}_{{w}_{i}}\in {X}_{W}$, the relationship between the tweet-word subgraph is represented as an adjacency matrix ${A}_{pw}$.

We build the edges ${E}_{pw}$ with the word occurrence in source tweets. Formally, the weight of edge between node I and j is defined as:

$${A}_{pw\left(ij\right)}=\left\{\begin{array}{c}PMI\left(i,j\right),\\ TF-ID{F}_{ij},\\ 1\\ 0\end{array} \begin{array}{l}i,j are words,PMI\left(i,j\right)>0\\ i is tweet, j is word\\ i=j\\ otherwise\end{array}\right.$$

(2)

The PMI value of a word pair i, j is computed as:

$$\left\{\begin{array}{c}PMI\left(i,j\right)=log\frac{p\left(i,j\right)}{p\left(i\right)p\left(j\right)}\\ p\left(i,j\right)=\frac{\#W\left(i,j\right)}{\#W}\\ p\left(i\right)=\frac{\#W\left(i\right)}{\#W}\end{array}\right.$$

(3)

where $\#\text{W}\left(i,j\right)$ denotes the number of sliding windows that contain both word i and word j, $\#\text{W}$ represents the number of sliding windows, and $\#\text{W}\left(i\right)$ indicates the number of sliding windows that contain word i.

The weight of the edge between a source tweet node and a word node is the term frequency-inverse document frequency (TF-IDF) of the word in the source tweet where term frequency is the number of times the word appears in the source tweet, inverse document frequency is the logarithmically scaled inverse fraction of the number of source tweet that contain the word. To utilize global word co-occurrence information, we use a fixed size-sliding window on all source tweet in the corpus to gather co-occurrence statistics. We employ the point-wise mutual information (PMI), a popular measure for word associations, to calculate weights of the edge ${E}_{ww}$ with a fixed size-sliding window on all source tweets.

(2)
Propagation graph construction.

The nodes in the tweet-user sub-graph are the tweet and user nodes in the heterogeneous graph, and the edges are composed of the edges between tweets and users on the heterogeneous graph, the nodes in tweet-user subgraph indicate as ${X}_{pu}=\left\{{x}_{{p}_{1}},{x}_{{p}_{2}},\dots ,{x}_{{p}_{\left|P\right|}},{x}_{{u}_{1}},{x}_{{u}_{2}},\dots ,{x}_{{u}_{\left|U\right|}}\right\},{x}_{{p}_{i}}\in {X}_{P}^{,},{x}_{{u}_{i}}\in {X}_{U}^{,}$, ${X}_{P}^{,}$ and ${X}_{U}^{,}$, where and are the node representations transformed by the transformation matrix, the relationship between the tweet-user subgraph is represented as an adjacency matrix ${A}_{pu}$.

We calculate the weight of edge ${E}_{pu}$ by the reciprocal of the time the user retweeted or responded to the tweet related to the source tweet. Formally, the weight of edge between node $i$ and node $j$ is defined as:

$${A}_{pu\left(ij\right)}=\left\{\begin{array}{c}1/\left(t+1\right),\\ 1, \\ 0,\end{array} \begin{array}{l}i is tweet,j is useri,j are words,PMI\left(i,j\right)>0\\ i=j\\ otherwise\end{array}\right.$$

(4)

where t represents the elapsed time when a user $j$ retweeted or replied to tweets related to a source tweet $i$.

Dual channel convolution

Dual channel convolution GCN can process data with a generalized topological graph structure and deeply explore its characteristics and laws. Since the neighbors of each node in the subgraph have different importance to learn node embeddings for rumor detection, it is inspired by graph attention networks GAT³⁴, encoder module we use GCN and GAT to learn the characteristics of nodes respectively. The GCN is used to extract features to find appropriate embedding vectors for nodes in the graph, and realize the graph reconstruction in the subsequent decoder module, GAT is utilized an attention mechanism to learn the importance of each node’s neighbors and merge the representation of these neighbors with the importance to form each node’s representation.

As for subgraphs, we use GCN to learn a Gaussian Distribution, and then sample $z$ from this distribution. The Gaussian Distribution can be uniquely determined by the mean $\mu$ and standard deviation $\delta$ which can be learned using GCN respectively, finally, a new adjacency matrix is generated by graph reconstruction.

For adjacency matrix ${A}_{pw}$ and adjacency matrix ${A}_{pu}$, We exploit GCN to learned the mean and standard deviation respectively, and used reparameterization²⁸ method to construct and update the gradient. The formula is as follows:

$${H}_{1}=GCN\left(X,{A}_{pw}\right)={A}_{pw}\sigma \left({A}_{pw}X{W}_{0}\right){W}_{1},$$

(5)

$$\mu ={GCN}_{\mu }\left({H}_{1},{A}_{pw}\right)$$

(6)

$$\text{log}\sigma ={GCN}_{\sigma }\left({H}_{1},{A}_{pw}\right)$$

(7)

$${z}_{pw}=\mu +\varepsilon \sigma$$

(8)

where ${H}_{1}\in {\mathbb{R}}^{n\times v}$ represent the hidden features of GCN;$\text{X}\in {\mathbb{R}}^{n\times d}$ is feature matrix of ${X}_{pw}$, $\varepsilon$ is sampled from a standard Guassian Distribution,${W}_{0}$,${W}_{1}$ are the trainable parameter matrices of GCN, with weight matrices ${GCN}_{\mu }\left({H}_{1},{A}_{pw}\right)$ and ${GCN}_{\sigma }\left({H}_{1},{A}_{pw}\right)$ share first-layer parameters ${W}_{0}$. Similar to Eqs. (3), (4), (5) and (6), we use the same calculation method to learned a Gaussian Distribution of the tweet-user subgraph and sample ${z}_{pu}$.

In order to obtain sufficient expressive ability, we use the GAT to learn the weights between nodes in the subgraph; the graph attention layer is designed as follows:

$${e}_{ij}=LeakyReLU\left({W}_{a}{x}_{i},{W}_{q}{x}_{j}\right), {x}_{i},{x}_{j}\in {X}_{pw\left(pu\right)}$$

(9)

$${a}_{ij}=softmax\left({e}_{ij}\right)=\frac{exp\left({e}_{ij}\right)}{\sum_{k\in {\text{\rm N}}_{i}}exp\left({e}_{ik}\right)}$$

(10)

$${x}_{i}^{,}=\sigma \left(\sum_{j\in {N}_{i}}{a}_{ij}{W}_{k}W{x}_{j}\right)$$

(11)

where ${W}_{a}$,${W}_{q}$,${W}_{k}$ are trainable weights and ${a}_{ij}$ is the attention weight between ${x}_{i}$ and ${x}_{j}$.

Finally, we extend employing a self-attention to multi-head attention to learn more stable embedding, the multi-head attention can be denoted as:

$${x}_{i}^{,}={||}_{k=1}^{K}\sigma \left(\sum_{j\in {N}_{i}}{a}_{ij}^{k}{W}^{k}{x}_{j}\right)$$

(12)

where || represents concatenation, ${a}_{ij}^{k}$ are normalized attention coefficients computed by the $k-th$ attention mechanism (${a}^{k}$), and ${W}^{k}$ is the corresponding input linear transformation’s weight matrix.

Overall, Given the representation ${X}_{pw}$ of nodes in tweet-word subgraph and the representation ${X}_{pu}$ nodes in tweet-user subgraph, Input the node representations ${X}_{pw}$ and ${X}_{pu}$ into the subgraph attention neural network to get a new node representation, where the nodes embedding in tweet-word subgraph denote as ${X}_{pw}^{,}=\left\{{x}_{{p}_{1}}^{,},{x}_{{p}_{2}}^{,},\dots ,{x}_{{p}_{\left|P\right|}}^{,},{x}_{{w}_{1}}^{,},{x}_{{w}_{2}}^{,},\dots ,{x}_{{w}_{\left|W\right|}}^{,}\right\}$,and the nodes embedding in tweet-user subgraph indicate as ${X}_{pu}^{,}=\left\{{x}_{{p}_{1}}^{,},{x}_{{p}_{2}}^{,},\dots ,{x}_{{p}_{\left|P\right|}}^{,},{x}_{{u}_{1}}^{,},{x}_{{u}_{2}}^{,},\dots ,{x}_{{u}_{\left|U\right|}}^{,}\right\}$.

Multi-graph reconstruction decoderl encoder

VGAE mainly finds suitable embedding vector for nodes in the graph and realizes graph reconstruction. We take the matrices ${z}_{pw}$ and ${z}_{pu}$ as the input of the multi-graph reconstruction decoding, in order to make the reconstructed adjacency matrix $\widehat{{A}_{pw\left(pu\right)}}$ similar to the original adjacency matrix ${A}_{pw(pu)}$. We use inner product and a sigmoid function to reconstruct the original graph, and the reconstructed adjacency matrix is obtained through the formula:

$$\widehat{{A}_{pw}}=\sigma \left({Z}_{pw}{Z}_{pw}^{T}\right)$$

(13)

$$\widehat{{A}_{pu}}=\sigma \left({Z}_{pu}{Z}_{pu}^{T}\right)$$

(14)

where $\sigma$ is sigmoid function.${Z}_{pw}\in {\mathbb{R}}^{{n}_{1}\times h}$,${Z}_{pu}\in {\mathbb{R}}^{{n}_{2}\times h}$ stands for the matrix form of ${z}_{pw}$ and ${z}_{pu}$ respectively. Since ${Z}_{pw}$ and ${Z}_{pu}$ are obtained through sampling, noise (standard deviation $\upsigma$) will increase the difficulty of reconstruction in the process of reconstructing the adjacency matrix. We apply categorical cross-entropy loss for reconstruction of adjacency matrix; the process can be represented as:

$${\mathcal{L}}_{pw(pu)}=\frac{1}{{A}_{row}{A}_{col}}\sum mlog\widehat{m}+\left(1-m\right)log\left(1-\widehat{m}\right)$$

(15)

where m and $\widehat{m}$ are the elements of ${A}_{pw(pu)}$ and $\widehat{{A}_{pw(pu)}}$ respectively.

In order to prevent the noise from being zero and to ensure that the model has the ability to generate, we add the KL divergence to the loss function. Minimizing it means optimizing the probability distribution parameters ($\upmu$ and $\upsigma$) as similar as possible to the target distribution (Gaussian Distribution). The formula is as follows:

$${\mathcal{L}}_{{\mu ,{\sigma }^{2}}_{pw(pu)}}=-\frac{1}{2}\sum_{i=1}^{{n}_{{i}_{\text{1,2}}}}\sum_{j=1}^{{n}_{{d}_{\text{1,2}}}}\left({\mu }_{ij}^{2}+{\sigma }_{ij}^{2}-log{\mu }_{ij}-1\right)$$

(16)

where ${n}_{{d}_{\text{1,2}}}$ are the dimensionality of the implicit variable ${Z}_{pw(pu)}$,${n}_{{i}_{\text{1,2}}}$ represent the number of all nodes in the subgraph respectively.

Multi-feature rumor detector

The tweet-word subgraph contains the global semantic relation information of text contents, while the tweet-user subgraph contains the information involved in source tweet propagations. However, when the information containing two subgraphs is fused, the large difference between the global semantic features and the user propagation features may cause some useless features to affect the detection performance. Based on this, we proposed a decision-level detector method to fuse subgraph features, included decision-level global feature fusion strategy and adaptive gated fusion strategy.

Global feature fusion strategy

Given the node embeddings ${X}_{pw}^{,}$ and ${X}_{pu}^{,}$, potential representations ${Z}_{pw}$ and ${Z}_{pu}$ after sampled from the Gaussian distribution ,which serve as input to the global feature fusion network, the weights of the tweet-word and tweet-user subgraph are calculated as follows:

$${S}_{pw}^{,}={X}_{pw}^{,}\oplus {Z}_{pw}$$

(17)

$${S}_{pu}^{,}={X}_{pu}^{,}\oplus {Z}_{pu}$$

(18)

$$\left({\beta }_{{\Phi }_{pw(pu)}},{\beta }_{{\Phi }_{pw(pu)}}\right)={att}_{glo}\left({S}_{pw}^{,},{S}_{pu}^{,}\right)$$

(19)

where ${S}_{pw}^{,}\in {\mathbb{R}}^{{n}_{1}\times v}$ is the global semantic features of tweet-word subgraph,${S}_{pu}^{,}\in {\mathbb{R}}^{{n}_{2}\times v}$ is the global user propagation features of tweet-user subgraph.${att}_{glo}$ represents the feedforward neural network that performs the global feature fusion strategy.

In order to learn the weights of the tweet-word subgraph and the tweet-user subgraph, we first transform the representation of the node in subgraphs by a nonlinear transformation (e.g. single-layer MLP). Then we measure the importance of the node representations as the similarity of transformed embedding with a global features attention vector $q$. Furthermore, we average the importance of all nodes in subgraphs as the importance of subgraphs. The importance of tweet-word (tweet-user) subgraph, denoted as ${W}_{pw(pu)}$, is shown as follows:

$${W}_{pw(pu)}=\frac{1}{|{S}_{pw(pu)}^{,}|}\sum_{{x}_{i}\in {S}_{pw\left(pu\right)}^{,}}{q}^{T}.\text{tanh}\left({W}_{gol}{x}_{i}+b\right)$$

(20)

where W is the weight matrix, $b$ is the bias vector, $q$ is the global attention vector, Note that all above parameters are shared by the tweet-word subgraph and the tweet-user subgraph, after obtaining the importance of each subgraph, we normalize them via softmax function.${\beta }_{{\Phi }_{pw(pu)}}$ represents the weight of tweet-word (tweet-user) subgraph, can be obtained by normalizing the above importance of two subgraphs using softmax function:

$${\beta }_{{\Phi }_{pw(pu)}}=\frac{exp\left({w}_{pw(pu)}\right)}{\sum_{\Phi \in \left\{pw,pu\right\}}exp\left({w}_{\Phi }\right)}$$

(21)

Which can be interpreted as the contribution of the ${\Phi }_{pw(pu)}$ for specific task, with the learned weights as coefficients, we can fuse the tweet nodes representation in the subgraph and get the source tweets representation ${P}_{m}$ as follows:

$${P}_{m}=\left\{{p}_{1},{p}_{2},\dots ,{p}_{m}\right\}$$

(22)

$${p}_{i}=\sum_{\Phi \in pw,pu}{\beta }_{\Phi }\cot {p}_{{m}_{i}},{p}_{{m}_{i}}\in {P}_{\Phi }^{,}$$

(23)

where $m$ is the numbers of source rumors,${p}_{i}$ denotes the expression of twitter sentence node $i$ in the $\Phi$ subgraph,${P}_{\Phi }^{,}$ represents the sentence node representation with global relation information in the $\Phi$ subgraph.

Adaptive gated fusion strategy

We connected the latent representations in the two subgraphs as input to the adaptive gated fusion unit. By designing gate unit to promote competition or collaboration between neurons, select features from each subgraph feature that are more conducive to rumor detection, the adaptive gated fusion network can be denoted as:

$$\text{S}=\left[{S}_{pw}^{,};{S}_{pu}^{,}\right]$$

(24)

$$\text{g}=\upsigma \left({W}_{gat}\cdot S+b\right)$$

(25)

$${G}_{gat}=\text{tanh}\left(\text{g}\odot S\right)$$

(26)

where S represents the connection of node features of tweet-word subgraph and tweet-user subgraph, include global semantic features and user communication relationship features. $\text{g}$ is the state of adaptive gated fusion unit, and ${G}_{gat}$ denotes the feature of shared feature S after adaptive gated fusion unit. ${W}_{gat}$ is the weight matrix, $b$ is the bias vector,$\upsigma$ is sigmoid activation function.

As the last layer, the global attention feature ${p}_{i}$ and local gate feature ${G}_{gat}$ are then fed to softmax layer for classification respectively. The formula is as follows:

$$\widehat{{y}_{glo}}=softmax\left({p}_{i}W+b\right)$$

(27)

$$\widehat{{y}_{gat}}=softmax\left({G}_{gat}W+b\right)$$

(28)

We use the cross-entropy loss and a regularization term are used as the model’s objective optimization function to train the model’s parameters.

$${\mathcal{L}}_{gol}=-\sum_{i\in m}{y}_{i}log\widehat{{y}_{glo}}+\lambda ||\theta {||}_{2}^{2}$$

(29)

$${\mathcal{L}}_{gat}=-\sum_{i\in m}{y}_{i}log\widehat{{y}_{gat}}+\lambda ||\theta {||}_{2}^{2}$$

(30)

$$\mathcal{L}=\upeta {\mathcal{L}}_{gat}+\left(1-\eta \right){\mathcal{L}}_{gol}$$

(31)

where ${y}_{i}$ denotes the ground truth one-hot vector of the i-th source tweet, $\lambda$ represents the trade-off coefficient, $||\cdot {||}_{2}^{2}$ indicates the L2 regularization term to prevent overfitting and $\eta$ is the Break-even parameters.

Joint training encoder

We encode the textual semantic information and user propagation information by the encoder module. The graph reconstruction aims to reconstruct the data to learn the structure information while the multi-feature decision-level decoding aims to classify the event. We jointly train these modules by minimizing the loss over all events and the final loss is computed as:

$$\text{Loss}=\upkappa \cdot \left({\mathcal{L}}_{pw\left(pu\right)}+{\mathcal{L}}_{{\mu ,{\sigma }^{2}}_{pw\left(pu\right)}}\right)+\mathcal{L}$$

(32)

where $\upkappa$ also is the Break-even parameters, since the graph reconstruction loss is far greater than the loss of event classification, we optimize the loss function by designed the Break-even parameters.

Experiment

In this section, we first introduce datasets used in the experiment and then we will evaluate our proposed model on the datasets compared with other baseline models.

Datasets

We evaluate our proposed method on three real world datasets: Twitter15 and Twitter16⁷. They are most famous social sites all over the world. In the datasets, contained 1490 and 818 source tweets of rumors, respectively. Nodes refer to source tweets, the set of words that source tweets contained, and the set of users, edges represent the relationships between the tweet-word, word-word and tweet-user, and features are indicated using TF-IDF values, PMI and the time that the user retweeted or replied to tweets related to the source tweet. Twitter15 and Twitter16 datasets contains four labels: Non-rumor (N), False Rumor (F), True Rumor (T), and Unverified Rumor (U). The label of each source tweet in Twitter15 and Twitter16 is annotated according to the veracity tag of the article in rumor debunking websites (e.g., snopes.com, Emergent.info, etc.). The statistics of the two datasets are shown in Table 1.

Table 1 Statistics of the datasets.

Full size table

Setting

We implement our models using the same set of hyper parameters in our experiment. We utilize the micro-average accuracy (i.e., Acc.) in all categories and the F1score of the precision and recall in each category to evaluate the performance of models. The batch size is 64. The GCN layers hidden dim is 32. The learning rate is initialized at 5e−4 and gradually decreases during the model training process. The total process is iterated upon 30 epochs. We initialize the word vector with 300 dimensions word embedding. The number of heads K of the GAT is set to 8 and the hidden size is 32. We select the best Break-even parameters,$\eta$ is 0.4 and $\kappa$ is 0.1. Training/validation/test set ratios: 70%/15%/15%.

Baselines

We compared our model with a range of current baseline rumor detection models and state-of-the-art models as follows:

DTC¹: A rumor detection method using decision tree classifiers with manual features to obtain information credibility.
SVM-TS³⁵: A Linear SVM Classifier Model Considering the Structure of Time Series.
SVM-TK⁷: A SVM classifier with a propagation Tree Kernel on the basis of the propagation structures of rumors.
MVAE³⁶: A multimodal rumor detection model that combines variational autoencoder and classifiers to explore text and picture information.
RvNN¹⁰: A rumor detection model based on propagation tree structure uses GRU unit to learn rumor representation.
PCC⁶: A detection model for mining user feature sequence combines RNN and CNN neural network.
GCAN³¹: A detection model based on source tweets and user characteristics based on propagation, combining GCN and Dual Co-attention Mechanism.
VAE-GCN ³²: A rumor detection model that uses GCN as an encoder and variational GAE as a decoder to explore the structure of rumor propagation.
BI-GCN³⁰: A GCN-based rumor detection model using semantic bidirectional propagation structure.
GLAN³⁹: A rumor detection model that jointly encodes the global information between source tweets, retweets and users.
HGATRD⁴: A heterogeneous graph attention model based on meta-path, used to capture text semantic features and global propagation features.

Tables 2 and 3 show the performance of all compared methods on Twitter15 and Twitter16 dataset which including the baseline model algorithm and the current state-of-the-art rumor detection model. Bold indicates the highest result of this test set; The slash indicates that this method does not have this test set for testing. For fair comparison, the experimental results of the rumor detection model directly quote the best previous performance data.

Table 2 Rumor detection results on Twitter15 datasets.

Full size table

Table 3 Rumor detection results on Twitter16 datasets.

Full size table

First, in all baseline model algorithms (DTC, SVM-TS, SVM-TK) that use manual features, their performance is significantly weaker than methods based on deep learning. There is no doubt that deep learning methods can better dig out the effective features of rumors, while methods based on manual features are less accurate and efficient.

Second, in comparison with the current state-of-the-art rumor detection algorithm, our model has a better performance, which proves its effectiveness in rumor detection. It can be found from the GCN-based detection model (GCAN, VAE-GCN, BI-GCN, GLAN,HGATRD) that their performance is relatively better than other deep learning models (RvNN, PPC), which indicates that GCN can learn more comprehensive information and better node representation from social networks. Due to GRU, RNN and CNN cannot process data with the graph structure, important structural features in social information are ignored, resulting in performance degradation. The powerful performance of VAE-GCN and HGATRD illustrates the superiority of VAE-GCN and HGATRD in rumor detection tasks. However, these methods ignore the difference between semantic features and propagation representations, and global features are not well utilized. Our method achieves the best performance because it selectively capture features that more effective.

Finally, compared with some specific models, although our method does not occupy all the best evaluation data, considering the tradeoff among different performance measures, it shows the effectiveness of our method in the task of rumor detection.

Parameters experiments

In deep learning, the parameter selection of the model also has a great influence on the experimental results. By adjusting some important parameters in the model, the performance of the model can be improved significantly. In this section, we investigate the sensitivity of parameters.

Graph reconstruction parameters

At the stage of graph reconstruction, A good ${z}_{pw}$ should make the reconstructed adjacency matrix $\widehat{{A}_{pw(pu)}}$ similar to the original adjacency matrix ${A}_{pw(pu)}$. However, in the actual training process, the reconstruction loss is far greater than the classification loss of rumor events. We add κ.

to balance the training loss, and the experimental results are shown in Fig. 3. When $\upkappa =0.1$, the performance is the best. The reason is that graph reconstruction is to better explore global structural information, but event classification is the main focus of the model, and a smaller scale factor should be given.

Decision parameters

The Tweet-word subgraph focuses on capturing the global semantic relationship of the text content, and the tweet-user subgraph focuses on exploring the information involved in the spread of the source Twitter. We design the global feature screening mechanism and the local feature screening unit to select and filter more effective features flexibly. In order to check the influence of the screening mechanism on the detection of rumor events, we explore the performance of the model with different coefficients $\eta$, and the results are shown in Fig. 3.When $\eta =0.4$, Twitter15 dataset performs better and $\eta =0.3$, Twitter16 dataset has a higher accuracy. These may reveal that Twitter15 has a larger amount of data and when multiple features with larger differences are interactively fused, the attention mechanism produces a more distinguishable feature representation, which plays a more important role.

Finally, we select several better sets of parameters for joint training to get the best performance, as shown in Table 4:

Table 4 Optimal parameter selection.

Full size table

Ablation analysis

To verify the effectiveness of the different modules in the model in this paper, we report the contribution of each component by deleting several components from the entire model.

Importance analysis of subgraphs

We delete tweet-user subgraph and tweet-user subgraph from the model respectively, use GAT and VGAE to model the tweet-word subgraph and learn node representations. Due to that we only conduct experiments on one of the subgraph features, we directly add the output features of GAT and GCN and send them to the multi-feature decision-level decoder module for rumor detection.

The empirical results are summarized in Table 5, the combination of multi-graph features has a better detection effect than the single subgraph. In social networks, rumors are very misleading and difficult to identify from the single characteristic. Specifically, the detection accuracy of tweet-word subgraph is higher than that of tweet-user subgraph. This result shows that text semantic information is more important for rumor detection tasks.

Table 5 The importance analysis of subgraphs.

Full size table

Importance analysis of VGAE

For a more intuitively comparation, we perform visualization tasks, aiming to lay out heterogeneous graphs in low-dimensional space. In the process of exploring the influence of global structure information on rumor detection, the graph reconstruction module was deleted. We use GAT to model tweet-word subgraph and tweet-user subgraph separately and learn node features, without reconstructing the event structure, and other experimental settings remain unchanged. Here we visualize outputs in a two-dimensional space by applying the $t$-SNE algorithm³⁷.

Figure 4 provides the experimental results of these methods on Twitter15 and Twitter16 datasets. Different types of events (NR, FR, TR, UR) in the dataset can be classified well, while VGAE-GAT shows better performance. Specifically, the point distribution in Only-GAT is more scattered and irregular, and there are even more event categories that overlap each other. While points in VAGE-GAT spread around regularly, with smaller intervals between the same kind and larger intervals between different kinds. We can observe that our model uses VGAE to learn the posterior distribution, which not only provides a more flexible graph generation model, but also learns structural information better to obtain a better result representation.

Importance analysis of multi-feature screening strategy

To further evaluate the capture of valuable features by our model, we separately disassembled the decision-level global feature fusion strategy and the adaptive gated fusion strategy for rumor detection. Note that the Break-even parameters $\eta$ does not work anymore, and we directly use the output of the filtering mechanism for the task of rumor detection.

From the experimental results in Table 6, we have the following observations: For the feature screening mechanism, the attention-based global feature screening component has a better performance improvement than the gating-based adaptive feature screening component. Specifically, we can see that the accuracy of the decision-level global feature fusion strategy is 3.6% and 3.3% higher than that of the adaptive gated fusion strategy on Twtter15 and Twitter16 datasets. Our model combines two fusion mechanisms, which demonstrates the rationality of the interactive fusion of multiple features.

Table 6 The importance analysis of subgraphs.

Full size table

Conclusion

In this study, we propose a Multi-feature interaction heterogeneous graph reconstruction method with semantic graph and user propagation graph constraints for rumor detection. This method makes full use of the difference between the textual semantic features in heterogeneous graphs and the structural features of user propagation. Specifically, we decompose the heterogeneous tweet-word-user graph into tweet-word subgraph and tweet-user subgraph, and then use GCN and GAT to learn text semantic information and user communication representation, and apply VGAE learn the overall structure representation. In addition, in order to effectively select and utilize multi-graph global features, we explore a multi-feature screening mechanism to detect rumors. In this paper, we propose a Multi-feature interaction heterogeneous graph reconstruction method with semantic graph and user propagation graph constraints for rumor detection. This method makes full use of text semantic features and user propagation pattern features, and learns more effective feature information through multifeature fusion strategy. Experimental results on two real datasets Twitter15 and Twitter16 demonstrate the effectiveness and superiority of our proposed method, and our proposed model achieves new state-of-the-art results. Ablation studies confirmed the usefulness of different parts of the model.

Data availability

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Castillo, C., Mendoza, M., Poblete, B. Information credibility on twitter. In Proceedings of the 20th International Conference on World Wide Web, WWW 2011, Hyderabad, India, March 28 - April 1, 2011 (2011)
Potthast, M., Kiesel, J., Reinartz, K. et al. A stylometric inquiry into hyperpartisan and fake news. arXiv preprint arXiv:170205638 (2017).
Raj, C. & Meel, P. Convnet frameworks for multi-modal fake news detection. Appl. Intell. 51(11), 8132–8148 (2021).
Article Google Scholar
Huang, Q., Yu, J., Wu, J. et al. Heterogeneous graph attention networks for early detection of rumors on twitter. In 2020 International Joint Conference on Ne2020GCANural Networks (IJCNN) 1–8 (IEEE, 2020).
Xu, S. et al. Rumor detection on social media using hierarchically aggregated feature via graph neural networks. Appl. Intell. 53, 1–14 (2022).
Google Scholar
Yu, F. et al. A convolutional approach for misinformation identification. IJCAI 2017, 3901–3907 (2017).
Google Scholar
Jing, M., Wei, G., Wong, K. F. Detect rumors in microblog posts using propagation structure via kernel learning. In The 55th annual meeting of the Association for Computational Linguistics (ACL, 2017).
Shu, K., Cui, L., Wang, S. et al. Defend: Explainable fake news detection. In KDD (2019).
Shu, K., Wang, S., Liu, H. Beyond news contents: The role of social context for fake news detection. In Proceedings of the twelfth ACM International Conference on Web Search and Data Mining 312–320 (2019).
Jing, M., Wei, G., Wong, K. F. Rumor detection on twitter with tree-structured recursive neural networks. In The 56th Annual Meeting of the Association for Computational Linguistics (2018).
Qazvinian, V., Rosengren, E., Radev, D. R. et al. Rumor has it: Identifying misinformation in microblogs. In Conference on Empirical Methods in Natural Language Processing (2011).
Yang, L., Wu, Y. Early detection of fake news on social media through propagation path classification with recurrent and convolutional networks. In Thirty-Second AAAI Conference on Artificial Intelligence (2018).
Popat, K. Assessing the credibility of claims on the web. In International Conference (2017).
Velicˇkovic´, P., Cucurull, G., Casanova, A. et al. Graph attention networks. arXiv preprint arXiv:171010903 (2017).
Wu, L., Rao, Y. Adaptive interaction fusion networks for fake news detection. arXiv preprint arXiv:200410009 (2020).
Kwak, H., Lee, C., Park, H. et al. What is twitter, a social network or a news media? In Proceeding International Conference on World Wide Web (2010).
Lu, Y. J., Li, C. T. Gcan: Graph-aware coattention networks for explainable fake news detection on social media. arXiv preprint arXiv:200411648 (2020).
Shu, K., Zhou, X., Wang, S. et al. The role of user profiles for fake news detection. In Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 436–439 (2019).
Ke, W., Song, Y., Zhu, K. Q. False rumors detection on sina weibo by propagation structures. In IEEE International Conference on Data Engineering (2024).
Giudice, K. D. Crowdsourcing credibility: The impact of audience feedback on web page credibility. Proc. Am. Soc. Inf. Sci. Technol. 47(1), 1–9 (2010).
Article Google Scholar
Liu, X., Nourbakhsh, A., Li, Q. et al. Realtime rumor debunking on twitter. In ACM (2015).
Monti, F., Frasca, F., Eynard, D. et al. Fake news detection on social media using geometric deep learning. arXiv preprint arXiv:190206673 (2019).
Rauber, P. E., FalcXe, A. X., Telea, A. C. Visualizing time-dependent data using dynamic t-sne. In Eurographics (2016).
Shu, K., Mahudeswaran, D., Wang, S. et al. Hierarchical propagation networks for fake news detection: Investigation and exploitation. In Proceedings of the International AAAI Conference on Web and Social Media 626–637 (2020).
Liu, Y., Wu, Y. F. Early detection of fake news on social media through propagation path classification with recurrent and convolutional networks. In Proceedings of the AAAI Conference on Artificial Intelligence (2018).
Kipf, T. N., Welling, M. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:160902907 (2016).
Lotfi, S. et al. Detection of rumor conversations in twitter using graph convolutional networks. Appl. Intell. 51(7), 4774–4787 (2021).
Article Google Scholar
Ye, J. & Guo, J. Dual-level interactive multimodal-mixup encoder for multi-modal neural machine translation. Appl. Intell. 52, 1–10 (2023).
Google Scholar
Yuan, C., Ma, Q., Zhou, W. et al. Jointly embedding the local and global relations of heterogeneous graph for rumor detection. In 2019 IEEE International Conference on Data Mining (ICDM) 796–805 (IEEE, 2019).
Bian, T., Xiao, X., Xu, T. et al. Rumor detection on social media with bi-directional graph convolutional networks. In Proceedings of the AAAI Conference on Artificial Intelligence 549–556 (2020)
Ma, J., Gao, W., Wei, Z. et al. Detect rumors using time series of social context information on microblogging websites. In Proceedings of the 24th ACM international on Conference on Information and Knowledge Management 1751–1754 (2015).
Lin, H., Zhang, X., Fu, X. A graph convolutional encoder and decoder model for rumor detection. In 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA) 300–306 (IEEE, 2020).
Kipf, T. N., Welling, M. Variational graph auto-encoders. arXiv preprint arXiv:161107308 (2016).
Wu, L. et al. Discovering differential features: Adversarial learning for information credibility evaluation - sciencedirect. Inf. Sci. 516, 453–473 (2020).
Article Google Scholar
Ma, J., Gao, W., Mitra, P. et al. Detecting rumors from microblogs with recurrent neural networks. In International Joint Conference on Artificial Intelligence (2016).
Khattar, D., Goud, J. S., Gupta, M. et al. Mvae: Multimodal variational autoencoder for fake news detection. In The World Wide Web Conference 2915–2921 (2019).
Ruchansky, N., Seo, S., Liu, Y. Csi: A hybrid deep model for fake news detection. In ACM (2023).

Download references

Funding

There was no funding.

Author information

Authors and Affiliations

School of Intelligent Manufacturing Engineering, Shanxi University of Electronic Science and Technology, Linfen, 041000, Shanxi Province, China
Yongping Liu, Jianliang Wang, Ming Yin & Chunjiang Zhao

Authors

Yongping Liu
View author publications
Search author on:PubMed Google Scholar
Jianliang Wang
View author publications
Search author on:PubMed Google Scholar
Ming Yin
View author publications
Search author on:PubMed Google Scholar
Chunjiang Zhao
View author publications
Search author on:PubMed Google Scholar

Contributions

Liu Yongping is responsible for conceptualizing, original writing, and editing. Wang Jianliang is responsible for image processing and editing. Zhao Chunjiang is responsible for editing and reviewing. Yin Ming is responsible for English polishing.

Corresponding author

Correspondence to Yongping Liu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Liu, Y., Wang, J., Yin, M. et al. Heterogeneous graph convolutional network for rumor detection with multi-level interactive fusion and graph reconstruction. Sci Rep 15, 31639 (2025). https://doi.org/10.1038/s41598-025-15740-z

Download citation

Received: 26 January 2025
Accepted: 11 August 2025
Published: 27 August 2025
Version of record: 27 August 2025
DOI: https://doi.org/10.1038/s41598-025-15740-z

Abstract

Similar content being viewed by others

Iterative heterogeneous graph learning for knowledge graph-based recommendation

Graph-enhanced implicit aspect-level sentiment analysis based on multi-prompt fusion

A hybrid contrastive graph neural network framework for fake news and hate speech detection using content and user interaction signals

Introduction

Related work

rumor detection

graph convolutional network

Method

The multiple graph convolutional encoder

Multi-graph construction

Dual channel convolution

Multi-graph reconstruction decoderl encoder

Multi-feature rumor detector

Global feature fusion strategy

Adaptive gated fusion strategy

Joint training encoder

Experiment

Datasets

Setting

Baselines

Parameters experiments

Graph reconstruction parameters

Decision parameters

Ablation analysis

Importance analysis of subgraphs

Importance analysis of VGAE

Importance analysis of multi-feature screening strategy

Conclusion

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links