Abstract
The evolution of telecommunication technologies not only enhances social interactions but also inadvertently fosters an environment for telecom fraud. Graph-like data generated from traceable telecommunication interactions offers a foundation for graph-based fraud detection. However, the complexity and dynamism of interaction networks present formidable challenges. Our data exploration revealed that fraudsters exhibit consistent distinguishability from normal users in specific dynamic behavioral traits. Leveraging previous research, we constructed a latent synergy network (LSN) through second-order relations. Analysis of LSN exposed that fraudsters, seeking to elude detection, adopt deceptive behaviors by establishing connections with numerous normal users, leading to the over-smoothing problem in traditional GNN models. Consequently, we introduce the Dynamic Pattern with Adaptive Filter Graph Learning framework for telecom fraud detection. With the sequential network, we capture users’ dynamic behavioral features for LSN input. Additionally, in LSN learning, we designed a trainable filter to capture differences between feature channels during information aggregation, mitigating the over-smoothing problem. On the Sichuan Telecom, Sichuan-mini Telecom, and YelpChi datasets, using AUC, Recall, and F1-score as evaluation metrics, DPGFD outperforms GCN, GraphSAGE, FRAUDRE, BWGNN, and GAGA by an average of 5%.
Similar content being viewed by others
Introduction
The proliferation of mobile phones worldwide not only facilitates social interactions and information exchanges but also inevitably provides the breeding ground for various telecom fraud activities. As Truecaller reported, more than 68 million telecom frauds occurred in 2022, causing 39.5 billion USD in economic loss, and less than 3% of these cases were resolved. Telecom fraudsters usually make a myriad of calls to induce callees to transfer their money to specified accounts, and they tend to camouflage and hide in huge volumes of calling records, which brings great difficulty for telecom fraud detection1.
Most prior work usually tackles the telecom fraud detection problem from either the sequence classification task (e.g., multivariate, variable-length time series anomaly detection)2 or network schema (e.g., graph measures and latent factor models)3,4 separately. However, the sequential and interactive behaviors are essentially interwoven as their connections are established by their consecutive behaviors, so they should be considered collectively for learning discriminable representations. Fortunately, telecom interaction is usually traceable, whether the user is normal or fraudulent. The traceable interaction can be naturally treated as graph-structure data where the users and their interactions are considered as the nodes and edges, respectively5. Recently, the emerging graph neural networks (GNNs) have achieved remarkable success in representation learning on graphs, which provides opportunities for graph-based telecom fraud detection methods. Nevertheless, there are still some challenges.
Graph dynamicity. As we know, telecom interaction is always continuous, which means structural and temporal dynamics on bipartite graphs, as shown in Fig. 1a (Note that the Fig. 1 was produced using Microsoft Visio 2019 6, with graphical elements sourced from publicly available materials on the internet). Such dynamic evolution of telephony social behaviors and network topology can complicate the telecom fraud detection task.
Graph Heterophily. As Tencent reported, there usually exist telecom criminal gangs and several different roles in a gang. To improve the success rate of fraud, they usually show synergy patterns, that is, alternately make calls to the victims. Based on this, we construct the latent synergy network (LSN) where the concerned callers with at least one of the same callees are linked, as shown in Fig. 1b. As a double-edged sword, while considering synergy fraud patterns, the interaction between normal callers may add many more connections between normal and fraudulent callers. Consequently, LSN tends to be a heterophilous graph where callers with different labels have more connections. However, traditional GNNs work well due to the homophilous assumption (i.e., connected nodes have similar features and the same labels) and act as a low-pass filter to aggregate information from local neighborhoods for commonality retaining. Therefore, this smoothing aggregation mechanism is obviously not suitable for LSN, as it may lead to fraudsters to conceal their fraudulent features within numerous connected normal users.
Telecom interactions demonstrate temporal continuity; however, existing methods lack a unified framework that concurrently models both dynamic calling behaviors and social structures. Moreover, telecom fraud groups often operate in a collaborative manner. To bridge this gap, we reconstruct a dense latent collaborative network from the original sparse communication records, thereby reframing telecom fraud detection as an anomaly classification task. Additionally, the connectivity patterns within telecom fraud networks are both complex and uncertain. Conventional graph neural networks operate under the heterophily assumption and employ fixed low-pass filters in convolution operations. This approach results in over-smoothing, hindering the effective differentiation of various user types in heterogeneous graphs.
To tackle above challenges, we proposed a Dynamic Pattern Graph Fraud Detection (DPGFD) framework for telecom fraud detection. Specifically, DPGFD is composed of four modules: (1) a multi-LSTM-based behavior encoder to learn the trend of dynamic behavior evolution of users at multiple time scales and get the dynamic behavior patterns from individual sequential behaviors; (2) an attention-based fusion module to adaptively combine dynamic behavior patterns obtained from several time scales for the intermediate representation of caller nodes; (3) a latent synergy network (LSN) extractor to take the influence of collaborating patterns into consideration so as to explore the second-order social structure information of fraudsters; (4) a adaptive filter based LSN learning module to embed nodes, which not only solves the issue of graph heterophily, but also takes into accounts the different frequency response filter employed to each signal channels.
To validate the performance of our model, DPGFD is compared with traditional methods, including GCN 7, GraphSAGE 8, FRAUDRE 9, BWGNN 10, and GAGA 11. The results demonstrate that DPGFD achieves excellent performance across three datasets: Sichuan Telecom, Sichuan-mini Telecom, and YelpChi. The contributions are summarized as follows:
-
1.
Through the analysis of real-world datasets, we discover the trend of heterophily of fraudsters in synergy networks, and discover the discriminability of fraudsters and normal users on several dynamic behavior features extracted from sequential CDRs.
-
2.
A dynamic behavior and latent synergy network learning method is proposed to capture local dynamic behavior patterns from individual sequential CDRs and global network structural information. In addition, an improved adaptive filter is further developed to aggregate information based on different channels and neighbors for the representations of nodes.
-
3.
Experiments conducted on public real-world datasets verify the effectiveness of our method for telecom fraud detection, and the various frequency responses on different types of users and each feature channel.
Related work
Here, we first summarize the development of telecom fraud detection methods. Secondly, we also make a summary of GNN-based detection methods for other types of fraud activities. Furthermore, we also make a review of the current research study about the graph heterophily issue.
Telecom fruad detection
Earlier effort in telecom fraud detection typically relied on rule-based or feature engineering-based methods. Specifically, Hilas et al.12 and Miramirkhani et al.13 exploited expert experience to manually set alarm thresholds and conditional rules of telecom fraud patterns. When it exceed the established normal range, the detection system will issue an alarm prompt. Another widely used family of methods is based on machine learning models to learn the differences in statistical behavioral characteristics of fraudsters and normal users. Murynets et al.14 collected telephone billing data from 500 fraudsters and 93,000 normal users of major cellular network operators in the U.S. and extracted static telephone social behavior features such as call duration, call length, dialing and answering percentages, and used machine learning models such as decision trees and random forests to jointly learn the fraudsters’ telephone fraud behavior patterns. Statistical behavioral feature-based approach discovers fraudsters by learning static behavioral features, however, lacking the consideration of data sequential feature differences, and user-to-user messaging learning, this approach may be unsuitable when fraudulent tactics change.
Considering the limitations of the rule-based method, some sequence-based or machine learning approaches are proposed. First, by treating consecutive call records as a sequence, telecom fraud detection is considered as an anomalous sequence detection task. Guo et al.2 proposed an LSTM variant (HAInt-LSTM) that captures the periodicity of phone behavior in a sequence by improving the forgetting gate of LSTM and introducing the before-and-after behavior interval control, and historical behavior attention mechanism. Lin et al.3 proposed a collective sequence and interaction model (COSIN) to uniformly model both sequential and interaction behaviors of users using a probabilistic graphical model. Subsequently, they further designed a Hawkes-enhanced sequence model (HESM)15 by introducing the Hawkes process into sequential behavior modeling to model periodic patterns. By uncovering the “precision fraud” strategy in telecom fraud activities, a probabilistic factor graph-based method (FFD)16 was proposed by defining four types of factor functions. Although the above methods have made some achievements by introducing sequential behavioral models and probabilistic graphical models, they still fail to explicitly exploit the difference in social connect patterns between fraudsters and benign users, which hinders them from achieving better performance.
Recently, some GNN-based telecom fraud detection methods have been proposed. Ji et al.4 proposed a multi-range gated graph neural network (MRG-GNN) by introducing the efficient short walk and node-merging strategy to capture content information and relation information between users. An end-to-end telecom fraud detection method (BTG)17 was proposed to deal with the problem of graph sparsity. To deal with the graph imbalance problem, they further proposed a cost-sensitive graph neural network (GAT-COBO)18. However, due to dataset access limitations and privacy protection, the study of GNN-based methods specially designed for telecom fraud detection is still under-explored.
Although some progress has been made, they inevitably fail to adequately consider both sequential and interaction behavior patterns of fraudsters simultaneously, and simplify the complexity of fraud graph networks to some extent.
Here we divide these works into three main categories based on their methods, as summarized in the Table 1. We can observe that most studies fail to fully take advantage of both sequential behavior and telecom social structure. Although there exist some GNN-based telecom fraud detection methods, they haven’t taken sequential behavior modeling into consideration and oversimplify the complexity of the fraudulent graph. To this end, it prompts us to develop a novel graph neural network with a dynamic behavior pattern encoder and an adaptive graph filter to deal with the above drawbacks for telecom fraud detection.
Graph-based fraud detection
Graph-based fraud detection focuses on the identification of anomalous nodes19,20,21. TGAT22 is a temporal attention mechanism to aggregate temporal-topological neighbor characteristics. BiDyn23 is a method for fraud detection in large-scale dynamic bipartite graphs. FRAUDRE9 is a variant of GNN that uses graph-independent embeddings to represent users. PC-GNN5 is a graph neural network with unbalanced supervised learning to solve the problem. BWGNN10 is a GNN variant with beta wavelet to capture not only the similarity but differences of nodes. Another genre is based on dynamic graphs to identify anomalous nodes. Such models define a score function to measure the normality of a node, find nodes with dramatic changes in normality at a certain time window when sliding time windows, and predict them as fraudsters24,25. TGAT22 investigated fraudsters’ evolving personal characteristics and topology, and proposed a temporal attention mechanism to aggregate temporal-topological neighbor characteristics. Andrew et al.23 aimed at the scarcity of fraud markers in large-scale networks and the complexity of combining time and network patterns, and proposed a method for fraud detection in large-scale dynamic bipartite graphs, while learning from limited training Summarized in the label. However, these methods only pay attention to a specific aggregation method due to the lack of consideration of graph heterophily, and do not dynamically adjust the aggregation method according to the heterophilous information between nodes and neighbors.
Heterophilic graph learning
Graph Neural Networks have proven to be useful for many different practical applications. However, many existing GNN models have implicitly assumed homophily among the nodes connected in the graph, and therefore have largely overlooked the important setting of heterophily, where most connected nodes are from different classes. In recent years, some works have studied the heterophily problems on graphs. Jiong et al.26 dealt with the graph heterophily issues by designing a combination of separation of ego- and neighbor-embeddings, higher-order neighborhoods, and intermediate representations to boost learning from the graph structure under heterophily. Jiong et al.27 proposed a framework generalizing GNNs to have homophilous or heterophilous graphs. It contains an interpretable compatibility matrix for modeling heterophilous or homophilous levels in graphs, which can be learned in an end-to-end manner, enabling it to go beyond the assumption of strong homophily. Lun et al.28 proposed a GNN model based on bi-kernel feature transformation and a selection gate, for capturing homophily and heterophily information respectively. Although the above methods have made some achievements, they are not specially designed for fraud detection to deal with complex connection patterns between fraudsters and normal users, which urges us to develop a modified GNN model to make it applicable to fraud or more complex scenarios.
Preliminary and problem formulation
Definitions
Definition 1
Dynamic Bipartite Graph (DBG) is the graph denoted as \(\mathbf {G_B} = ( {\mathbf{U},\mathbf{V},\mathbf{E},{\mathbf{X}_{U}},{\mathbf{Y}_{U}}} )\) where \(\mathbf{U} \in \mathbb {R}^{N \times 1}\) is the set of concerned callers, \(\mathbf{V} \in {\mathbb {R}^{M \times 1}}\) is the set of callees. \(e = \left( {u,v,t,{\mathbf{f}}} \right) \in \mathbf{E}\) is a connection (telecom interaction or event) between \(u \in \mathbf{U}\) and \(v \in \mathbf{V}\) at the timestamp t and \({\mathbf{f}}\) refers the attributes of this interaction. \({\mathbf{X}_U} \in {\mathbb {R}^{N \times d}}\) denotes the d-dimension feature vector of all N caller nodes but no features for callee v. \({\mathbf{Y}_U}\in \mathbb {R}^{N}\) is the labels for each caller where \({y_i} = 1\) represents the telecom fraudster and \({y_i} = 0\) represents the normal user. \(N_{DBG}\left( u \right) = \left\{ {v\left| {\left( {u,v,t,{\mathbf{f}}} \right) \in \mathbf{E},v \in V} \right. } \right\}\) denotes the set of temporal neighbors of caller u.
Definition 2
Graph Heterophily is to describe the degree of differences between the labels of connected nodes in a graph. If connected nodes \(u_i\) and \(u_j\) have different labels, we call \(e_{ij}\) a heterophilic edge and if they have the same label, we treat it as a homophilic edge. The larger graph heterophily, the more heterophilous edges exist in the graph.
Definition 3
Calling Event Sequence is a chronological sequence of user’s telecom events denoted as \(Evt^{u} = \left\langle {evt_1^u,evt_2^u,\ldots ,evt_T^u} \right\rangle\) where \(evt_i^u = \left( {u,{v_i},{t_i},{{\mathbf{f}}_i}} \right)\) and T is the length of the sequence.
Dataset description
We use two real telecom fraud datasets collected from Sichuan Province, China. They are called Sichuan Telecom and Sichuan-mini Telecom datasets, respectively. Specifically, the Sichuan Telcom dataset contains about 5.02 million call records from August 2019 to March 2020, involving about 1.99 million users. The Sichuan-mini Telecom dataset contains about 0.11 million call records in April 2020, involving about 46.91 thousand users. More information will be detailed later. Each call record contains caller ID, callee ID, start time, call duration, and call type. All phone numbers are processed to the identification number for privacy protection.
Experimental investigation
In this section, we will conduct an investigation into the discriminability of several dynamic behavior features between normal users and fraudsters, the existence of latent synergy in the Sichuan Telecom dataset, and the problem statement.
Dynamic behavior pattern discriminability. To explore the dynamic differences in the selection of call targets between fraudsters and normal users, we build the set of calling targets for every user in each time slice. Then we calculate the average repetition rate of calling target sets selected by fraudsters and users in adjacent (first-order) or spaced (second-order) time slices, respectively. The result is shown in Fig. 2a, b. It can be observed that the one-order and two-order repetition rates of fraudsters are significantly below the normal users. That may be because fraudsters are more likely to call many different people in a broad search for potential victims, while normal users are generally limited by the social circle and living environment, they tend to make calls to the familiar relatives and friends. Furthermore, we calculated the rates of calling times to each target to the total records for both fraudsters and normal users, and they are defined as energy dispersion. It can be found that the energy dispersion of fraudsters is much smaller than the normal ones, as shown in Fig. 2c. Based on Fig. 2a–c, we can observe that the fraudsters adopt a prompt and effective strategy, i.e., quickly making a massive number of calls to different people in a short period of time, which is significantly different from normal telephone behavior patterns.
Based on the above stable discrimination in the dynamic features \(\mathbf{f}^{u,k}\), we build several calling slice sequences \({{{\mathbf{s}}}^{u,k}}\) from different time scales \(\tau\):
where \(T^{\prime }\) is the length of sequence split by k-th time scale. Then the total chronological dynamic behavior features matrix \({{{\mathbf{s}}}^u}\) from \(\tau\) time scales can be defined as follows:
Latent synergy evidence. Second, we also explore the latent synergy relationship in fraud activities. Specifically, we argue that if there exist synergistic relationships between two fraudsters, then the two ones definitely defraud the same victim together. Based on this, we first counted the number of the same victims who were selected by one fraudster and his partner, and then performed histogram statistics on these numbers. The result is shown in Fig. 3, where the x-coordinate represents the number of victims who may be co-frauded, and the y-coordinate is the corresponding proportion of fraudsters. In the figure, we can find that only 10\(\%\) of fraudsters conduct fraudulent activities alone, and more than 50\(\%\) of fraudsters have more than five common fraud targets, which implies that there exist prevalent co-fraud behaviors between fraudsters. To make full use of synergistic relationships to help fraud detection, we build a latent synergy network (LSN) which makes an edge between callers if they have common callees, and it can be represented as \({G}_L=\left( U, E_L \right)\), where U are the set of callers, \(E_L\) is the set of edges.
Problem statement. In this paper, the telecom fraud detection task is phrased as a binary classification problem. Considering a dynamic bipartite graph and calling event sequences which are formed by user\(^\prime\) call detail records \({{\mathbf{s}}}^u\), we learn a function \(f\left( \cdot \right)\) to map the caller nodes into a low-dimension feature matrix \(\mathbf{Z}\). Given the learned representations of callers \(\mathbf{Z}\), given some labeled nodes and the built latent synergy network G, a classifier is trained to predict the fraud probability of remaining users in the testing set, and the formal formulation can be written as follows:
where Y is the result, classifier is the classification function, and Z is the feature matrix.
Methods
Model overview
In this section, we will present our proposed method DPGFD, as shown in Fig. 4. DPGFD consists of four main modules, (1) a latent synergy network extractor for LSN construction; (2) a multi-LSTM based encoder for dynamic calling behavior patterns learning; (3) an improved GNN based on the edge-aware frequency adaptive filter; and (4) a classifier for fraud detection. Specifically, the network extractor is first used to conduct LSN construction as mentioned above. Then, the multi-LSTM based behavior encoder is introduced for dynamic calling behavior pattern learning at multiple time scales. Next, an attention-based fusion module is used to learn the corresponding weights to the dynamic behavior patterns of each scale for feature fusion. For graph learning, an edge-aware frequency adaptive graph filter is designed to deal with graph heterophily issues by adjusting both high- and low-frequency components on each feature channel during aggregation. Finally, a classifier is defined for model training and fraud detection.
The four modules in DPGFD constitute the core of the framework. The Multi-LSTM-based Encoder captures dynamic behavioral features in users’ sequential call behaviors, offering essential temporal insights for fraud detection. The Multiscale Feature Fusion module adaptively integrates features across different time scales, allowing the model to better capture key information and improve feature representation. The LSN Extractor utilizes collaborative relationships to identify fraudsters’ cooperative patterns from a global perspective, augmenting individual behavioral features with social structural insights. The Frequency Adaptive Filter mitigates graph heterogeneity by dynamically capturing commonalities while preserving discriminative information during aggregation, optimizing node embeddings and enhancing the model’s ability to differentiate various types of nodes. These four modules function complementarily and are tightly integrated, working together to provide more comprehensive and accurate information for fraud detection.
Latent synergy network extractor
As introduced above, LSN is constructed by making connections between two callers who have more than one common callee. The adjacency matrix of the LSN is calculated as follows:
where \({{\mathcal{N}_{DBG}}\left( {{u_i}} \right) }\) and \({{\mathcal{N}_{DBG}}\left( {{v_i}} \right) }\) denotes the neighbor set for user \(u_i\) and \(u_j\).
Multi-LSTM based encoder
In this section, we aim to capture the calling behavior features from callers’ sequential calling behaviors. As for the sequence processing problem, it is natural to use an RNN-based network. Considering the length of the calling feature sequence and demand for memory ability, we use LSTM to encode sequential calling behaviors from calling slice sequences \({{{\mathbf{s}}}^{u,k}}\) and learn the dynamic behavior pattern \({{{\mathbf{z}}}_{u,k}}\), and it is calculated as follows:
Additionally, we use multiple time scales to get several calling slice sequences, which can better adapt to telecom fraud detection:
where \(\hat{{\mathbf{z}}}_u\) denotes the dynamic behavior pattern.
Attention-based fusion module
Considering the dynamic behavior pattern feature vectors (i.e., \({{{\mathbf{z}}}_{u,1}},{{{\mathbf{z}}}_{u,2}}, \ldots ,{{{\mathbf{z}}}_{u,\tau }}\)) of different time scales make different contributions, hence a learnable weight parameter is assigned to each one. The importance of each dynamic behavior pattern feature vector \({{{\mathbf{z}}}_{u,k}}\) is denoted as:
where \(\hat{a_k}\) is the attention weight, \(\left\langle { \cdot , \cdot } \right\rangle\) denotes the inner product, d is the dimension of \({{\mathbf{z}_{u,k}}}\), \({f_1}\left( \cdot \right)\) and \({f_2}\left( \cdot \right)\) are the feed-forward neural networks to map the input feature vector to a new vector representation. The final attention weight of each type of calling dynamic behavior feature is obtained by normalizing the attention value with softmax function as follows:
Finally, we can fuse these behavior feature vectors to obtain the dynamic calling behavior patterns as follows:
So far, we have derived the representation \(\mathbf{z}_u\) as the local calling behavior feature for the caller user. For the training of the dynamic behavior pattern of each user, we design the loss function for individual feature learning. Therefore, the dynamic behavior pattern is fed to the fully connected layer and classifier to obtain the individual feature loss:
where \(\mathbf{Z}=\{\mathbf{z}_u\mid u\in \mathbf{U}\}\in \mathbb {R}^{N\times d}\), \(\mathbf{W}\) is a learnable parameter matrix, \(\sigma\) is the activation function, and classifier can be softmax or sigmoid to classify each user to normal one or fraudster. Then the loss function of individual features can represented as:
where \(\mathscr {D}\) is the training set, \(\hat{y}_u^{ind}\) is the prediction of user u in \(\hat{\mathbf {{Y}}}_{ind}\).
Graph leraning besed on edge-aware frequency adaptive filter
Based on observed latent synergistic relationships in fraudulent activities, we construct LSN to make full use of synergistic relationships for fraud detection from the perspective of global connections. So far, we have obtained LSN and learned calling behavior features \(\mathbf{z}_u\) for caller node u, we need to design a graph learning network to re-embed the caller node to capture information from both node features and social structures. Traditional GNNs work based on the homophily assumption and act as the low-pass filter to aggregate information from local neighborhoods for commonality retaining. However, this smoothing aggregation mechanism is obviously not suitable for LSN, as it has been verified in section “Experimental investigation” that the social structure of LSN is of both heterophily and homophily and two types of users behave differently. To this end, an edge-aware frequency adaptive graph filter is developed by taking advantage of both high- and low-frequency components on each feature channel, which helps adaptively assimilate commonality and discriminate differences during information aggregation. Next, we will introduce it in detail.
Basic graph convolution filter. Following GCN, from the perspective of graph signal processing, given the one-channel signal \(\mathbf{x}\in \mathbb {R}^{N}\) of N nodes and adjacency matrix \(\mathbf{A} \in \mathbb {R}^{N\times N}\), the Laplacian matrix is \(\mathbf{L}=\mathbf{D}-\mathbf{A}\), where \(\mathbf{D}=diag\left( d_1,\ldots ,d_N\right)\) is the degree matrix (\(d_i=\sum _j \mathbf{A}_{i,j}\)). The symmetric Laplacian matrix is defined as: \(\mathbf{L}_{sym} =\mathbf{I}-\mathbf{D}^{-\frac{1}{2}}\mathbf{A}\mathbf{D}^{-\frac{1}{2}} =\mathbf{U}{\Lambda } \mathbf{U}^{T},\) where \(\mathbf{U}\in \mathbb {R}^{N\times N}=[\mathbf{u}_1, \ldots , \mathbf{u}_N]\) is the eigenvectors of \(\mathbf{L}_{sym}\) and \({\Lambda }=diag(\lambda _1, \ldots , \lambda _N)\) is the eigenvalues. According to the research of29, the eigenvector matrix of the normalized Laplace matrix is used for graph Fourier transform, and it is defined as \(\mathscr {F}(\mathbf{x})=\mathbf{U}^T \mathbf{x}\). The graph convolution of \(\mathbf{x}\) with filter \(\mathbf{g}=diag(\mathbf {\theta })\) parameterized be \(\mathbf {\theta }\in \mathbb {R}^{N}\) is defined as \(\mathbf{x} *_G \mathbf{g}= \mathbf{U} \mathbf{g}(\Lambda ) \mathbf{U}^T \mathbf{x}\). The majority of existing work uses a fixed low-pass filter for graph convolution. Recent studies have shown if the input signal is repeatedly convoluted with the low-frequency component, the high-frequency component will be greatly weakened, resulting in the well-known over-smoothing problem. In addition, the low-pass filter can aggregate the homophilic signal from neighbors, and assimilate the connected nodes even if they have different features and labels, which would inevitably introduce some noise during information aggregation.
Edge-aware frequency adaptive filter. In image processing tasks, the Laplacian kernel is widely used to capture high-frequency signals. As its counterpart in the graph signal process (GSP), we can also multiply the Laplacian matrix \({\widetilde{{\mathbf{L}}}_{sym}}\) with graph signal input \(\mathbf {{x}}\) to characterize the high-frequency features, which can be treated as the high-pass filter in GSP, and the graph filter can be rewritten as follows:
where \(\mathbf{I}\) denotes the identity matrix, \(\tilde{\mathbf{A}}=\mathbf{A}+\mathbf{I}\) is the self-loop augmented adjacent matrix. Here, \({\widetilde{{\mathbf{L}}}_{sym}}{{\mathbf{x}}}\) is used to denote its high-frequency components, and Eq. (12) emphasizes how much information of input signal \(\mathbf{x}\) and high-frequency component \({\widetilde{{\mathbf{L}}}_{sym}}{{\mathbf{x}}}\) involved in aggregation to derive the output \(\widehat{{\mathbf{x}}}\).
Since that, a learnable weight parameter \(\phi\) can be introduced to adaptively adjust how many high-frequency components are assigned for information aggregation, and it is defined as follows:
where \(\hat{\mathbf{x}}\) denotes the one-channel signal.
If we generalize Eq. (13) to multi-channel signals, which is denoted as \(\mathbf{X}\in \mathbb {R}^{N\times C}\), the operation can be represented as:
where \({\Phi }=diag(\phi _1, \ldots , \phi _F)\) denotes the learnable parameter to adjust C feature channels so that each channel is , \(\hat{\mathbf{X}}\) denotes the representation matrix of node after filtering.
Furthermore, given \(w = \left[ {{\phi _1}, \ldots ,{\phi _F}} \right]\), which controls the aggregation of signals from different channels, considering the unique heterophily of each edge, it should be dynamically changeable with different node pairs. Based on this, for message aggregation between pair nodes \(v_i\) and \(v_j\), the learnable \({w_{ij}} \in {R^{1 \times F}}\) can be calculated as follows:
where \({{{{\mathbf{X}}}_i}}\) and \({{{{\mathbf{X}}}_j}}\) are the representation of node \(v_i\) and \(v_j\), || denotes the concatenation, \({{\mathbf{W}}} \in {\mathbb {R}^{2F \times F}}\) is treated as a shared convolution kernel, \({\sqrt{2F} }\) is a scaling factor, and \(\tanh \left( \cdot \right)\) limits each value in \(w_{ij}\) in the range of -1 and 1.
Based on the above discussion, we can input L layers of filter operations to form a more powerful filter. Specially, let \(\mathbf{H}^{(0)}=\sigma (\mathbf{Z} \cdot \mathbf{W}_0)\), then we have
where \(\sigma\) is activation function, \({\Theta }\) is the learnable parameter matrix. After L layers, the output embedding of nodes is denoted as \(\mathbf{H}_i^{(L)}\).
Model training and inferring
After the final layer, the node representations are further fed into a classifier like Softmax or Sigmoid and obtain the final probability in normal users or fraudsters. Specifically, it can be calculated as:
For model training, we use the cross entropy loss of label prediction as part of the loss.
then the overall loss is the combination of label prediction loss and individual feature loss, which can be represented as:
where \(\gamma \in [0, 1]\) is the balance weight. The training algorithm is summarized in Algorithm 1.
Theoretical spectral analysis of frequency adaptive filter
Here, we provide a spectral analysis of proposed graph filter with L layers by omitting weight matrix \(\Theta\) and activation function \(\sigma \left( \cdot \right)\). Meanwhile, to simplify the formal expression, we only deploy a set of learnable shared filter parameters at l-th layer \({\Phi _l} = {\text {diag}}\left( {\phi _{1,l},\phi _{2,l},\ldots ,\phi _{F,l}} \right)\) without considering the difference between different node pairs. Given the j-th channel signal \(\mathbf{x}_j\in \mathbb {R}^N\), Eq. (13) can be written as follows:
where \({\varvec{\tilde{\Lambda }}} = diag\left( {{{\tilde{\lambda }}_1},{{\tilde{\lambda }}_2},\ldots ,{{\tilde{\lambda }}_i},\ldots {{\tilde{\lambda }}_N}} \right)\) is the eigenvalues matrix. Here, the frequency response of adaptive filter can be represented as:
By stacking L layers, the output can be derived as follows:
where the overall frequency response can be considered as \({f_L}({{\tilde{\lambda }}_i},{\phi _j}) = \prod \limits _{l = 1}^L {(1 - {\phi _{j,l}}{{\tilde{\lambda }}_i})}\). Compared the GCN whose frequency response for L layers is \({f_L}({{\tilde{\lambda }}_i}) = {\left( {1 - {{\tilde{\lambda }}_i}} \right) ^L}\), the proposed adaptive filter can not only adjust the importance of high-frequency and low-frequency components at each layer by the learnable parameter \(\phi\) but also decouple each feature channel with great flexibility to learn more complex graphs.
Experiments
Experiment settings
Datasets. We adopt 3 real-world datasets previously used in research to evaluate DPGFD.
-
Sichuan Telecom: It includes 5,015,430 call records collected in eight months, from August 2019 to March 2020, for selected 6,106 phone users in Sichuan Province, China. Fraudsters have been marked by the operator. Note that all relevant data has been strictly encrypted to prevent information leakage. It is worth noting that, Sichuan Telecom dataset is the largest publicly available dataset we collected, where unprecedented scale and temporal coverage (including seasonal variations) could collectively support the research for modeling complex fraud behaviors.
-
Sichuan-mini Telecom: It includes 111,783 call records collected from 818 users in Sichuan Province during one month (i.e., April 2020). It follows the same privacy policy as introduced above.
-
YelpChi: It is a wildly used fraud detection graph dataset where the nodes are the collected normal and scam reviews on hotels and restaurants, and the task is to detect the scam ones.
As introduced above we first construct two LSNs based on the Sichuan Telecom and Sichuan-mini Telecom datasets, respectively. The statistics of the three datasets are shown in Table 2. Referring to related work, the training set, validation set and test set are split by 4:2:4, respectively.
Baselines. Several state-of-the-art methods are used to verify the effectiveness of our method for telecom fraud detection. A brief description of the methods is as follows:
-
GCN7 is a general GNN model which aggregates feature information of nodes’ first-order neighbors.
-
GraphSAGE8 is a general inductive framework that leverages node feature information to efficiently generate node embeddings for previously unseen data. It learns a function that generates embeddings by sampling and aggregating features from a node’s local neighborhood.
-
FRAUDRE9 is an improved GNN method that focuses on the graph inconsistency and imbalance issues of the camouflaged fraudsters.
-
BWGNN10 is an improved GNN with spectral and spatial localized band-pass filters to better handle the ‘right-shift’ phenomenon in anomalies.
-
GAGA11 is a novel group aggregation enhanced transformer, which uses group aggregation to cope with the graph heterophily issue.
Evaluation metrics
In the paper, all experiments are conducted by Pytorch 1.9.0 with Python 3.8 on Ubuntu 20.04.1 with NVIDIA A100 and 128G DRAM. we use AUC, Macro-Recall, and Macro-F1 for performance evaluation.
Firstly, AUC is computed based on the relative ranking of prediction probabilities of all samples, so it could eliminate the influence of class imbalance, and it can be calculated as follows:
where \({ {{\mathcal{V}^ + }} }\) and \({ {{\mathcal{V}^ - }} }\) denote the positive and negative sets respectively, \(rank_i\) represents the rank of sample i from the postive set based on its prediction score. In short, given a pair of the arbitrarily chosen positive class sample and negative one, it measures how much probability that the model can correctly derive a score for the positive class sample (fraudster) that is greater than the score for the negative one (normal user).
Secondly, Recall represents the ratio of instances, which is not only the true positive samples but also correctly classified as the positive ones. It can be written as follows:
where \({\text {TP}} + {\text {FN}}\) denotes the total number of fraudsters in the test set, and TP represents the number of correctly predicted fraudsters. However, Recall only considers the prediction of positive samples (fraudsters), whereas the misclassification of normal users is ignored. Hence, Macro-Recall is introduced which represents the unweighted mean of Recall for each class.
Thirdly, F1-score is a commonly used evaluation metric in classification models, used to comprehensively measure the balance between precision and recall performance. The formula is as follows:
where \({\text{Precision}} = {\frac{\text{TP}}{{{\text{TP}} + {\text{FP}}}}}\) represents the proportion of samples predicted to be positive classes that are actually positive classes, and Recall has been defined in Eq. (24). Follwing the same setting as Recall, Macro-F1 is introduced to represent the unweighted mean of F1-score for each class.
Overall, the larger the values of AUC, Macro-Recall and Macro-F1, the better performance the model makes.
Graph heterophily evidence
Here we will verify the evidence of graph heterophily and shows the different connection modes of fraudsters and normal users in the reconstructed LSN from the Sichuan Telecom dataset. Specifically, graph heterophily is introduced to describe the degree of differences between the labels of connected nodes in a graph, as defined in Definition 2. Based on this, we first calculated the proportion of heterophilic edges of each user node in LSN, and then performed histogram statistics statistics of proportions for fraudsters and normal users, respectively. The result is shown in Fig. 5, we can observe that a significant number of fraudulent nodes in the LSN exhibit a higher heterophilic connection pattern, i.e., more than half of the fraudsters having a heterophily ratio greater than 60%. In contrast, the normal user shows lower heterophilic connection patterns. Therefore, fraudsters and normal users would exhibit complex connection patterns in various social networks. However, traditional GNNs perform well based on the assumption that connected nodes have similar features and labels, while they are not used for all fraudulent social networks. To overcome this, an improved GNN should be proposed to assimilate nodes with homophilic connections and discriminate nodes with heterophilic ones, which will help the model better adapt to various fraudulent social graphs.
Performance comparison and ablation study
Performance comparison. In this section, we compare our proposed DPGFD and state-of-the-art methods on the Sichuan Telecom, Sichuan-mini Telecom and YelpChi datasets. The result is shown in Table 3, and we have the following observations. First, for traditional graph neural network methods, GCN and GraphSAGE are not specially designed for biased dynamic bipartite graph learning, but also neglect graph heterophily issues. Therefore, it doesn’t perform well. Second, FRAUDRE, BWGNN and GAGA are state-of-the-art methods that are specially designed for graph-based fraud detection. By introducing similarity measurement and fraud-aware modules, FRAUDRE performs better than traditional GNNs overall. Suffer from graph heterophily issues caused by complex connection patterns, FRAUDRE is prevented from getting further improvement. To this end, by developing multiple spectral band-pass filters and a new label propagation strategy to deal with graph heterophily issues, BWGNN and GAGA outperform the other methods. Third, DPGFD has significant superiority in all metrics on three datasets, which can be attributed twofold. On the one hand, DPGFD can not only capture the local features from individual calling sequential behaviors but also explore the synergy patterns of telecom fraudsters from the global social structure. On the other hand, compared with the most competitive methods (BWGNN and GAGA) which resort to multiple bandpass filters to fit the whole graph, DPGFD is more sensitive to the heterophilic connections using a gate-like strategy.
Ablation study. In this section, we also perform ablation studies to validate the contribution of two main sub-modules of our proposed method. Specifically, we first introduce DPGFD-w/o-DP which removes the LSTM-based encoder and replaces it with static statistical calling behavior features for LSN learning. DPGFD-w/o-DP removes the edge-aware frequency adaptive filter and replaces it with a fixed filter for each layer. The result is also shown in Table 3. We can observe that both DPGFD-w/o-DP and DPGFD-w/o-DP show good performance, which exhibits the effectiveness of such two sub-modules. In contrast, DPGFD-w/o-DP performs better, which means dealing with graph heterophily issues by introducing the edge-aware frequency adaptive filter can make more contributions.
Case study of adaptive frequency response
In this section, we will visually display different frequency responses between different feature channels for information aggregation on heterophilic and homophilic edges. Specifically, we first obtained the aggregation information for each edge of the LSN at the last aggregation of the adaptive filter. Then, we divide all edges into three types, ie N2N (connections between normal users,), F2F(connections between fraudsters) and N2F(connections between normal users and fraudsters). Here, to facilitate the visual presentation, we set the number of feature channels C to 16. Next, we calculate the average value of \(w_{ij}\) for each type of edge, and the result is shown in Fig. 6. We can observe that it shows different frequency responses between different feature channels (such as channels 0, 3, 9, etc) on such three types of edges. But for channels 2, 4, 7 and 15, they show similar frequency responses. Therefore, there exists the complex connection pattern in fraud detection, which urges us to deal with information aggregation on edges.
Display of different frequency responses between different feature channels on three types of edges. The x-axis represents different signal channels, i.e., from channel 0 to channel 15. The y-axis represent frequency response value of each channel. “N2F” denotes the edges between normal users and fraudsters, “N2N” denotes the edges between normal users, “F2F” denotes the edges between fraudsters.
Parameter analysis
In this section, we will investigate the effects of hyper-parameters (i.e., the number of signal/feature channels C and the balance weight value loss \(\gamma\)). Here we perform relevant experiments on the Sichuan Telecom dataset, and similar results can also be found in the other two datasets.
Number of signal channels C. First, we study the effect on the number of signal channels C. Specifically, we range C from 2 to 64, and the result is shown in Fig. 7a. We can observe there is a gradual performance improvement when C grows from 2 to 16, which means the increase of C makes it gradually fit the distributions of the dataset. When \(C > 16\), there is some reduction in performance, and it may be because a larger channel number C would introduce some redundant information for signal aggregations. Considering it, we final set C of 16.
Loss balance weight \(\gamma\). Second, we investigate the effects of \(\gamma\) which controls the importance weight of two different loss terms. Specifically, we vary the value of \(\gamma\) in the range of 0.1 to 1 step by 0.1, and the result is shown in Fig. 7b. We can observe that as the \(\gamma\) value increases, the performance grows gradually early and fluctuates at the later stages. Taken overall, when \(\gamma\) is 0.5, it achieves the optimal performance, which means that the information of both individual dynamic behavior patterns and synergistic social patterns make a relatively equal contribution to fraud detection. Therefore, we chose \(\gamma\) of 0.5 to combine two losses.
Conclusion
Telecom fraud detection is a challenging task due to the complexity, dynamic nature, and large scale of fraudulent activities in the network. In this paper, we propose a novel approach to transform the telecom fraud detection problem, and the contributions of it can be summarized in three-fold. First, we introduced a referenceable method of how to construct a dense graph network from call records; Second, we proposed a uniform framework, which integrates both sequential behavior patterns and telecom social connection patterns; Third, an edge-aware frequency adaptive graph filter was developed to deal with the graph heterophily issue. Experimental results on real-world datasets demonstrate the effectiveness of our method.
Although DPGFD has achieved significant improvements, there is still some room to improve. In fact, sequential behavior and interaction structures are intertwined, leading to the gradual evolution of graph structures over time. Therefore, we will focus on a novel graph learning method that can adaptively update the representations of nodes with the dynamic changes of graphs. Simultaneously, the proposed DPGFD can be expected to be used for recommendation systems and relationship inference.
Data availability
The call detail records used in this paper have been carefully decrypted to ensure there is no risk of information leakage. Researchers can download and train on the corresponding data by visiting https://aistudio.baidu.com/datasetdetail/40690/0 or request it for free by contacting wjh1992@xju.edu.cn.
References
Jiang, Y., Liu, G., Wu, J. & Lin, H. Telecom fraud detection via Hawkes-enhanced sequence model. IEEE Trans. Knowl. Data Eng. 35, 5311–5324 (2022).
Guo, J., Liu, G., Zuo, Y. & Wu, J. Learning sequential behavior representations for fraud detection. In 2018 IEEE International Conference on Data Mining (ICDM), 127–136 (IEEE, 2018).
Lin, H. et al. Fraud detection in dynamic interaction network. IEEE Trans. Knowl. Data Eng. 32, 1936–1950 (2019).
Ji, S., Li, J., Yuan, Q. & Lu, J. Multi-range gated graph neural network for telecommunication fraud detection. In 2020 International Joint Conference on Neural Networks (IJCNN), 1–6 (2020).
Liu, Y. et al. Pick and choose: A GNN-based imbalanced learning approach for fraud detection. In Proceedings of the Web Conference 2021, 3168–3177 (2021).
Microsoft. Microsoft visio 2019 (2019-10-12).
Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
Hamilton, W., Ying, Z. & Leskovec, J. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, vol. 30 (2017).
Zhang, G. et al. Fraudre: fraud detection dual-resistant to graph inconsistency and imbalance. In 2021 IEEE International Conference on Data Mining (ICDM), 867–876 (IEEE, 2021).
Tang, J., Li, J., Gao, Z. & Li, J. Rethinking graph neural networks for anomaly detection. In International Conference on Machine Learning, 21076–21089 (PMLR, 2022).
Wang, Y. et al. Label information enhanced fraud detection against low homophily in graphs. In Proceedings of the ACM Web Conference 2023, 406–416 (2023).
Hilas, C. S. Designing an expert system for fraud detection in private telecommunications networks. Expert Syst. Appl. 36, 11559–11569 (2009).
Miramirkhani, N., Starov, O. & Nikiforakis, N. Dial one for scam: Analyzing and detecting technical support scams. In 22nd Annual Network and Distributed System Security Symposium (NDSS), vol. 16 (2016).
Murynets, I., Zabarankin, M., Jover, R. P. & Panagia, A. Analysis and detection of simbox fraud in mobility networks. In IEEE INFOCOM 2014-IEEE Conference on Computer Communications, 1519–1526 (IEEE, 2014).
Jiang, Y., Liu, G., Wu, J. & Lin, H. Telecom fraud detection via Hawkes-enhanced sequence model. IEEE Trans. Knowl. Data Eng. 35, 5311–5324 (2023).
Yang, Y. et al. Mining fraudsters and fraudulent strategies in large-scale mobile social networks. IEEE Trans. Knowl. Data Eng. 33, 169–179. https://doi.org/10.1109/TKDE.2019.2924431 (2021).
Hu, X. et al. BTG: A bridge to graph machine learning in telecommunications fraud detection. Future Gener. Comput. Syst. 137, 274–287 (2022).
Hu, X. et al. GAT-COBO: Cost-sensitive graph neural network for telecom fraud detection. IEEE Trans. Big Data 10, 528–542 (2024).
Akoglu, L., McGlohon, M. & Faloutsos, C. Oddball: Spotting anomalies in weighted graphs. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, 410–421 (Springer, 2010).
Wang, J., Wen, R., Wu, C., Huang, Y. & Xiong, J. FdGars: Fraudster detection via graph convolutional networks in online app review system. In Companion Proceedings of the 2019 World Wide Web Conference, 310–316 (2019).
Dou, Y. et al. Enhancing graph neural network-based fraud detectors against camouflaged fraudsters. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 315–324 (2020).
Xu, D., Ruan, C., Korpeoglu, E., Kumar, S. & Achan, K. Inductive representation learning on temporal graphs. arXiv preprint arXiv:2002.07962 (2020).
Wang, A. Z. et al. Bipartite dynamic representations for abuse detection. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 3638–3648 (2021).
Sun, J., Faloutsos, C., Papadimitriou, S. & Yu, P. S. Graphscope: parameter-free mining of large time-evolving graphs. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 687–696 (2007).
Chelmis, C. & Dani, R. Assist: Automatic summarization of significant structural changes in large temporal graphs. In Proceedings of the 2017 ACM on Web Science Conference, 201–205 (2017).
Zhu, J. et al. Beyond homophily in graph neural networks: Current limitations and effective designs. Adv. Neural. Inf. Process. Syst. 33, 7793–7804 (2020).
Zhu, J. et al. Graph neural networks with heterophily. In Proceedings of the AAAI Conference on Artificial Intelligence 35, 11168–11176 (2021).
Du, L. et al. GBK-GNN: Gated bi-kernel graph neural networks for modeling both homophily and heterophily. In Proceedings of the ACM Web Conference 2022, 1550–1558 (2022).
Shuman, D. I., Narang, S. K., Frossard, P., Ortega, A. & Vandergheynst, P. The emerging field of signal processing on graphs: extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Process. Mag. 30, 83–98 (2013).
Acknowledgements
We first gratefully acknowledge anonymous reviewers who take their valuable time to read this draft and make any helpful suggestions. This research was funded by the Science and Technology Tackling in Henan Province (Nos. 242102210072, 252102210060, 252102210166) and the National Natural Science Foundation of China No.U25A700028.
Author information
Authors and Affiliations
Contributions
J.Y. and J.W. conceptualized the workflow of this paper. S.L. conducted the formal analysis required. Zijun Huang contributed to the data interpretation and discussion of the results. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Yang, J., Li, S., Huang, Z. et al. An improve fraud detection framework via dynamic representations and adaptive frequency response filter. Sci Rep 15, 19051 (2025). https://doi.org/10.1038/s41598-025-02032-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-02032-9