Semantic ECG hash similarity graph

Fang, Yixian; Zhang, Shilin; Ren, Yuwei

doi:10.1038/s41598-025-07838-1

Download PDF

Article
Open access
Published: 03 July 2025

Semantic ECG hash similarity graph

Yixian Fang¹^na1,
Shilin Zhang² &
Yuwei Ren²^na1

Scientific Reports volume 15, Article number: 23791 (2025) Cite this article

1676 Accesses
Metrics details

Subjects

Abstract

Graph-based methods have made significant progress in addressing the dependent correlations among ECG time series variables. However, most existing graph structures primarily focus on local similarity while overlooking global semantic correlation. Additionally, the adjacency matrix is highly susceptible to noise interference, leading to unreliable node connections. In this paper, we present a novel graph generation learning framework that incorporates semantic hash coding to capture the intricate associations both within and between ECG signals, thereby significantly enhancing the retrieval efficiency of subsequent graph-based deep learning models. Specifically, the Semantic Hash Similarity Graph (SHSG) initially leverages the similarity within the label space to generate a hash representation for the supervised signal. Subsequently, a lightweight linear hash function is utilized to produce a hash representation for the unseen signal. Thereafter, a comprehensive global hash dictionary is systematically constructed. Finally, the graph topology is meticulously assembled by leveraging Hamming similarity. Additionally, to ensure the maintenance of semantic similarity, we propose an iterative optimization approach in the orthogonal domain for generating hash representations. To validate the efficacy of the generated graph, we utilized a fast Graph Convolutional Network (GCN) for ECG recognition. The experimental outcomes on multiple publicly available ECG datasets corroborate the robustness and effectiveness of our proposed method.

Multi-expert ensemble ECG diagnostic algorithm using mutually exclusive–symbiotic correlation between 254 hierarchical multiple labels

Article Open access 02 July 2024

The Harvard-Emory ECG Database

Article Open access 25 February 2026

A large-scale multi-label 12-lead electrocardiogram database with standardized diagnostic statements

Article Open access 07 June 2022

Introduction

Electrocardiogram (ECG) is a widely employed tool for monitoring cardiovascular diseases in clinical settings. However, the interpretation of ECGs can be subjective and may vary among different observers. Consequently, the integration of computer-assisted diagnostic support can significantly enhance the accuracy and consistency of diagnoses, thereby improving the overall quality of the healthcare system, particularly when combined with the expertise of clinical professionals. Consequently, research on electrocardiogram detection is of paramount importance for the advancement of intelligent medical treatment and has attracted growing attention in the field of machine learning^1,2,3,4,5,6.

ECG records cardiac electrical activity through electrodes on the body surface. Each wave’s characteristics provide critical clinical information. Standard ECG beats include the P wave, QRS complex, and T wave, which have temporal and spatial dependencies. Owing to the rapid advancement in computing resources and the availability of extensive experimental datasets, a variety of deep learning models, such as Convolutional Neural Networks (CNNs) and Long Short-Term Memory networks (LSTMs), have been successfully utilized for ECG detection and recognition. However, they struggle to effectively capture complex neighborhood information, which imposes limitations on their ability to parse the structure of the signal. Graph learning, leveraging its capacity to represent complex physical processes and irregular relationships, can enhance the accuracy of ECG signal detection and classification by treating segmentation as nodes.

In recent years, there has been a notable emergence of graph-based methods^7,8,9,10 within the field of ECG signal detection. These approaches have systematically validated the dependencies among signals by modeling variables as nodes in a graph structure and employing graph edges to characterize interactions between variables. This has significantly propelled the development of graph-based ECG anomaly detection technology. Such methods are fundamentally rooted in the framework of graph neural networks (GNNs)¹¹, which have demonstrated remarkable achievements in graph representation learning. However, experimental observations highlight significant limitations in the current graph construction strategies. For example, in our preliminary studies on the AF dataset¹², it was observed that KNN-based adjacency matrices (a) fail to effectively capture long-range semantic correlations, such as misclassifying arrhythmia patterns with similar local features but distinct global morphologies, and (b) demonstrate performance degradation under noisy conditions due to erroneous connections between dissimilar nodes. These issues arise because KNN-based methods:

Reliance on local proximity leads to the neglect of global label semantics, such as conflating classes that share overlapping local features yet possess distinct clinical meanings.
Generate density-biased edges, which may result in redundant connections within high-density regions and insufficient coverage in sparsely represented areas.
Lack robustness to noise, as the similarity based on Euclidean distance is highly susceptible to disruption by artifacts.

The quality of an initial graph structure is significantly influenced by the connections among its nearest neighboring nodes. However, performing nearest neighbor search on large-scale datasets incurs prohibitively high computational complexity. To address this challenge, hash technology has been recognized as an efficient solution^13,14,15. It employs compact binary codes to represent high-dimensional data and enables low-complexity nearest neighbor search processes via Hamming distance calculations^16,17. Furthermore, the binary encoding strategy in hashing inherently possesses the capability to suppress feature noise¹⁸. Drawing inspiration from this, we leverage hashing for semantic reconstruction and introduce a novel graph structure learning framework. This framework defines graph nodes based on semantic hash similarity, rather than exclusively relying on nearest neighbor relationships. Employing semantic hash similarity for constructing graph structures provides several advantages: (1) Compared to the nearest neighbor approach, semantic hashing is more effective in capturing global semantic similarity between nodes, rather than focusing solely on local structural relationships. (2) Hash codes assess similarity through efficient bit operations, such as calculating the Hamming distance, which offers superior computational efficiency compared to traditional methods like Euclidean distance or cosine similarity. (3) Semantic hashing demonstrates robustness against noise and redundant information by effectively filtering out insignificant details when mapping data into a low-dimensional space via a hash function. The primary contributions can be succinctly summarized as follows:

We propose a concise semantic hash graph learning framework, which initially acquires the hash representation of signals and employs Hamming similarity to alleviate the rigid similarity in the label space, and simultaneously addresses the issue of incomplete or redundant adjacency in similar graphs.
We propose a hash learning optimization algorithm in richness of supervised learned hash representations. Additionally, it generates hash representations for unseen signals using simple hash functions, constructs a global hash dictionary by combining trained and unseen signal hashes, and establishes a global graph structure based on Hamming similarity.
By evaluating the generated graph structures using a classic fast GCN, the results demonstrate that the obtained graphs outperform current popular baseline methods, particularly in scenarios involving large intervals.

Related work

Graph-based representation and recognition of ECG signals

Owing to its powerful capability to represent complex data structures, the graph-based approach (GA) has gained significant popularity in the realm of ECG representation and detection. By modeling the interdependencies among signals, this method can uncover the underlying relationship patterns that exist between different signals. This attribute proves to be highly valuable for anomaly detection.

For example, Arora et al.⁷ proposed a multimodal graph neural network (GNNs) to capture the intra and inter-ECG dependencies. Yang et al.⁸ transformed the ECG from the time domain to the graph domain through Graph Fourier Transform (GFT); then they converted the GFT results of ECG into graph bispectra to extract graph integral bispectra for recognition of arrhythmia. Chen et al.⁹ proposed local subgraph learning and signal-enhanced graph learning, which are respectively applied to the learning of electrocardiogram representation and signal-enhanced graph learning. For the multiperceptive region, Chen et al.¹⁰ proposed a spatio-temporal graph convolutional contraction network for arrhythmia recognition. Zhang et al.¹⁹ introduced a novel spatio-temporal residual GCN for the diagnosis of coronary heart disease by segmenting the ECG signals into small, single-channel blocks and converting these blocks into graph nodes. Iqbal et al.²⁰ integrated a double-layer graph convolutional layer with a spatial attention mechanism capture the local and global dependencies, thereby achieving more effective classification of arrhythmias.

The aforementioned methods have improved the detection efficiency of ECG through the use of graph-based approaches. However, the predefined graphs utilized in these methods focus solely on the similarity between neighboring nodes, thereby neglecting global semantic correlations. Additionally, the construction of such graphs involves quadratic complexity, rendering them inefficient for large-scale datasets.

Graph structure learning

The quality of a graph-based model is directly influenced by the quality of the graph. Rapidly constructing a high-quality graph is crucial for advancing graph-based learning. Graph structure learning (GSL)^21,22,23 can significantly enhance the quality of graphs by reconstructing or refining the original graph topology, which has recently garnered significant research attention^24,25,26.

The two most popular methods for constructing graphs are as follows: k Nearest Neighbors (kNN graphs)²⁷ and $\epsilon$ proximity thresholding ($\epsilon$-graphs)²⁸. A kNN graph is one that connects nodes through K-nearest neighbors, and the later establishes the link by measuring that the similarity of two nodes is less than the preset threshold $\epsilon$. Nevertheless, these two approaches not only entail significant computational complexity but also fail to address the challenge of sparse data effectively. Furthermore, it is possible for similar nodes to possess distinct labels, indicating that the initial graph constructed based on neighboring nodes may also incorporate a significant amount of noise²⁹. Consequently, scholars have proposed numerous graph regularization methodologies to address the aforementioned challenges. They includes sparse constraint³⁰, smooth constraint³¹, low rank constraint^32,33, matrix factorization^34,35 and so on. However, the constraints employed in these methods are imposed solely on the learner, while the graph structure is treated as a predefined condition. Fundamentally, these approaches still align with the categories of KNN graph or $\epsilon$-graphs.

Unlike models that solely focus on local neighborhoods, the Transformer architecture³⁶ facilitates message propagation between all nodes, thereby uncovering novel relationships. Kreuzer et al.³⁷ propose a learnable position encoding mechanism that employs an additional Transformer encoder to transform the Laplace spectrum of the graph. Dwivedi et al.³⁸ proposed the decoupling of structural embeddedness from positional embeddedness, allowing both to be updated in tandem with other parameters. These methods, which leverage kernel functions for node representation or encode both local and global characteristics of the graph structure through additional diffusion kernel, do enhance the structural optimization of the graph. However, they all rely on deep learning techniques and entail considerable training time and computational expenses.

Graph learning with hash-based approaches

Owing to its low retrieval time complexity ($\mathscr {O}(1)$) and minimal storage requirements, hash coding has garnered significant attention from researchers in recent years^39,40. There exist several methodologies that integrate hash-based techniques with graph learning approaches. For instance, Lu et al.³⁹ introduced the Asymmetric Transfer Hashing (ATH) framework to address the cross-domain heterogeneous search challenge. They optimized domain distribution and feature discrepancies by employing a dual-domain hash function and an adaptive bipartite graph. Wang et al. employ a graph embedding learning approach to generate hash codes suitable for model replication detection through the design of a fusion quantization and triplet loss hash network. Huang et al.⁴¹ employ a graph embedding learning approach to generate hash codes suitable for model replication detection through the design of a fusion quantization and triplet loss hash network. Hamming Spatial Graph Convolutional Networks (HS-GCN)⁴² model incorporated the concept of Hamming similarity into its framework, thereby enhancing the representation of both users and projects. Wang et al.⁴³ employed anchor graphs to effectively capture the intrinsic neighborhood structure and subsequently developed a graph-based hashing model.

In contrast to existing methods that primarily focus on learning hash codes utilizing graph structures, this paper proposes a novel approach where hash coding is employed to infer graph structures. This method not only reduces the computational complexity associated with the initial adjacency matrix but also mitigates the impact of graph noise, thereby providing a semantically enriched initial graph for graph-based deep learning models.

Materials and methods

The focus of this section lies on the proposed architecture for similarity reconstruction based on semantic hash. Once the global graph is generated, it is forwarded to fast GCN for efficient ECG detection. The entire process is depicted in Fig. 1. The section provides a comprehensive exposition of each component in the proposed approach, encompassing its formulation and solution process. Additionally, we elucidate the algorithm’s complexity and convergence properties.

Preliminary and notations

Throughout the paper, vectors (matrices) are denoted by lowercase (uppercase) bold letters, e.g., ${\textbf {b}}$ and ${\textbf {X}}$. $\Vert {\textbf {X}}\Vert _F=\sqrt{\sum _{i=1}^p\sum _{j=1}^{q}x_{ij}^2}$ is the Frobenius norm of a matrix ${\textbf {X}}=(x_{ij})_{p\times q}$ and $\Vert {\textbf {b}}\Vert _2=\sqrt{\sum _{i=1}^{p}b_i^2}$ is used to denote the $\ell _2$-norm of a vector ${\textbf {b}}\in \mathscr {R}^p$. Notation $\mathscr {S}_n^p$ denotes an orthogonal feasible region and $tr({\textbf {X}})$ denotes the trace of matrix ${\textbf {X}}$. We use ${\textbf {B}}=[{\textbf {b}}_1,\ldots ,{\textbf {b}}_n]^T\in \{-1,1\}^{n\times k}$ to denote the hash dictionary. The ECG training dataset is denoted as $\mathscr {D}=\{{\textbf {x}}_i,y_i\}_{i=1}^n$ with n samples, where ${\textbf {x}}_i\in \mathscr {R}^d$ and $y_i\in \{1,2,\ldots , c\}\subset \mathscr {N}$ are the ith sample and its corresponding label, respectively.

Let ${\textbf {G}}=\{{\textbf {V}},{\textbf {E}},{\textbf {X}}\}$ denote a graph, where ${\textbf {V}}=\{v_1,\ldots , v_n\}$ represents a sequence of nodes with $|{\textbf {V}}|=n$ and ${\textbf {E}}$ represents the set of edges connecting them. Edges depict the relationships between nodes and can also be represented as an adjacency matrix ${\textbf {A}}=[a_{ij}]\in \mathscr {R}^{n\times n}$, with $a_{ij}$ denoting the relationship between nodes $v_i$ and $v_j$.

Additionally, the optimization process employs the following notations: the i-th column of matrix ${\textbf {X}}\in \mathscr {R}^{p\times q}$ is denoted by ${\textbf {x}}_i$. ${\textbf {X}}_{\overline{i}}=[{\textbf {x}}_1,\cdots ,{\textbf {x}}_{i-1},{\textbf {x}}_{i+1},\ldots ,{\textbf {x}}_q]\in \mathscr {R}^{p\times {(q-1)}}$ means that the i-th column of ${\textbf {X}}$ is removed. ${\textbf {X}}_{i,v}=[{\textbf {x}}_1,\cdots ,{\textbf {x}}_{i-1},{\textbf {v}},{\textbf {x}}_{i+1},\ldots ,{\textbf {x}}_q]\in \mathscr {R}^{p\times q}$ denotes ${\textbf {X}}$ with its i-th column replaced by a given vector ${\textbf {v}}$. And $\otimes$ denotes the element-wise product. Finally, $\mathscr {B}({\textbf {c}},r)=\{{\textbf {x}}\in \mathscr {R}^p|\Vert {\textbf {x}}-{\textbf {c}}\Vert _2\le r\}$ is a p-dimensional hypersphere, where ${\textbf {c}}\in \mathscr {R}^p$ is the center and r is the radius.

Semantic hash similarity graph learning

Problem formulation

The binary-coded hash representation of data ensures consistent similarity between data and semantic affinity. Building upon this foundation, a novel framework has recently emerged in the field of hash learning, aiming to bridge the spatial gap between semantic space and Hamming space by distilling discrete hash codes from semantic affinity. The next step will involve the creation of a hash dictionary.

The approach specifically utilizes one-hot tag vectors for constructing semantic affinity and subsequently employs discrete symmetric matrix factorization to generate discriminant compact hash codes. Given any two label vectors ${\textbf {y}}_i,{\textbf {y}}_j \in \{0,1\}^{c}$, their similarity can be represented by $S_{ij}=\langle {\textbf {y}}_i,{\textbf {y}}_j\rangle$ or $S_{ij}=\frac{\langle {\textbf {y}}_i,{\textbf {y}}_j\rangle }{\Vert {\textbf {y}}_i\Vert \Vert {\textbf {y}}_j\Vert }$ (multi-label). So the affinity matrix can be expressed as ${\textbf {S}}={\textbf {Y}}^T{\textbf {Y}}\in \mathscr {R}^{n\times n}$. Symmetric decomposition model is to make the binary hash row vector ${\textbf {b}}_i \in \{-1,1\}^k$ of code length k satisfy ${\textbf {b}}_i{\textbf {b}}_j^T\approx kS_{ij}$ as far as possible, and its matrix form can be expressed as ${\textbf {B}} {\textbf {B}}^ T\approx k{\textbf {S}}$. A universal objective function can be unified as follows:

$$\begin{aligned} \min \limits _{{\textbf {B}}\in \{-1,1\}^{n\times k}}\mathscr {F}=\Vert k{\textbf {S}}-{\textbf {BB}}^ T\Vert _F^2, s.t.\; {\textbf {B}}^T{\textbf {B}}=n{\textbf {I}}_k. \end{aligned}$$

(1)

Equation 1 represents the classical objective function for learning hash codes based on semantics^44,45,46,47. However, due to the impact of the binary discrete constraint inherent in hashing, optimizing this function has consistently been a challenging problem in the field. Currently, there are primarily two strategies for optimizing Eq. 1: introducing intermediate variables to preserve the discreteness of hash coding^44,45, or employing a semi-relaxation strategy for optimization^46,47.

Unlike the aforementioned methods, we propose an equivalent objective function to overcome the optimization difficulty of Eq. 1. By re-evaluating Eq. 1, we find that its aim is $k{\textbf {S}}\approx {\textbf {BB}}^T$, i.e., $k{\textbf {Y}}^T{\textbf {Y}}\approx {\textbf {BB}}^T$. The accomplishment of $\sqrt{k}{{\textbf {Y}}}^T\approx {\textbf {B}}$ can serve as a viable alternative to the aforementioned objective. However, there exists a dimensional discrepancy between the label space and the Hamming space, which renders the achievement of this objective infeasible. To do this, we introduce an orthogonal rotation factor ${\textbf {R}}\in \mathscr {R}^{c\times k }$ to effectively mitigate the dimensional difference, i.e., $\sqrt{k}{{\textbf {Y}}}^T\approx {\textbf {B}} {\textbf {R}}^T$. As long as ${\textbf {R}}^T {\textbf {R}}\equiv {\textbf {I}}_k$, we have $({\textbf {B}} {\textbf {R}}^T)({\textbf {B}} {\textbf {R}}^T)^T={\textbf {B}} {\textbf {R}}^T{\textbf {R}} {\textbf {B}}^T={\textbf {B}} {\textbf {B}}^T=\sqrt{k}{{\textbf {Y}}}^T(\sqrt{k}{{\textbf {Y}}}^T)^T=k{\textbf {Y}}^T{\textbf {Y}}$. It is worth noting that the condition for the aforementioned equivalence is the constant orthogonality of the rotation factor. In other words, the subsequent optimization process must ensure that the rotation factor consistently resides on the Stiefel manifold. To sum up, Eq. 1 can be equivalently expressed as:

$$\begin{aligned} \min \limits _{{\textbf {B}}\in \{-1,1\}^{n\times k},{\textbf {R}}\in \mathscr {R}^{c\times k }}\mathscr {F}=\Vert \widetilde{{\textbf {Y}}}^T-{\textbf {BR}}^T\Vert _F^2, s.t.\; {\textbf {B}}^T{\textbf {B}}=n{\textbf {I}}_k, {\textbf {R}}^T {\textbf {R}}\equiv {\textbf {I}}_k. \end{aligned}$$

(2)

where $\widetilde{{\textbf {Y}}}=\sqrt{k} {\textbf {Y}}$. The subsequent optimization then shifted its focus to ${\textbf {R}}$. In order to effectively approach Eq. 1, it is imperative to ensure that in the subsequent iterations, ${\textbf {R}}$ progresses orthogonally step by step, thereby being constrained within the orthogonal feasible region.

Optimization

Due to the indefinite magnitudes of c and k, the strategy based on Stiefel manifold is not applicable to Eq. 2. Therefore, we adopt a two-step optimization approach. Firstly, we use the proximal linearized approximation of the augmented Lagrangian algorithm^48,49 to minimize the function value. Then, we utilize column-wise block coordinate descent method to handle orthogonal constraints. Eq. 2 w.r.t. ${\textbf {R}}$ can be formulated as:

$$\begin{aligned} \min \limits _{{\textbf {R}}\in \mathscr {R}^{c\times k }}\mathscr {F}({\textbf {R}})=tr({\textbf {RB}}^T{\textbf {BR}}^T)-2tr(\widetilde{{\textbf {Y}}} {\textbf {BR}}^T) s.t.\; {\textbf {R}}^T {\textbf {R}}\equiv {\textbf {I}}_k. \end{aligned}$$

(3)

The Lagrangian function of Eq. 3 can be expressed as follows, where $\Lambda$ represents the Lagrangian multiplier matrix for the orthogonal constraint:

$$\begin{aligned} \mathscr {L}=\mathscr {F}({\textbf {R}})-\frac{1}{2}tr(\Lambda ({\textbf {R}}^T {\textbf {R}}-{\textbf {I}}_k)). \end{aligned}$$

(4)

Then the gradient of Eq. 4 for ${\textbf {R}}$ is $\nabla _{{\textbf {R}}}\mathscr {L}=\nabla \mathscr {F}({\textbf {R}})-{\textbf {R}}\Lambda$. Let $\nabla _{{\textbf {R}}}\mathscr {L}=0$, combined with the first-order optimization conditions ( Substationarity: $({\textbf {I}}_c-{\textbf {R}} {\textbf {R}}^T)\nabla \mathscr {F}({\textbf {R}})=0$, Symmetry: ${\textbf {R}}^T\nabla \mathscr {F}({\textbf {R}})={\textbf {R}}\nabla \mathscr {F}({\textbf {R}})^T$, Feasibility: ${\textbf {R}}^T{\textbf {R}}={\textbf {I}}_k$.), we have $\Lambda ={\textbf {R}}^T \nabla \mathscr {F}({\textbf {R}})=\nabla \mathscr {F}({\textbf {R}})^T{\textbf {R}}$, $\nabla \mathscr {F}({\textbf {R}})={\textbf {R}}\Lambda$ and $\nabla \mathscr {F}({\textbf {R}})-{\textbf {R}}\nabla \mathscr {F}({\textbf {R}})^T{\textbf {R}}=0$.

In order to uphold the orthogonality constraint, we incorporate an additional penalty term for orthogonality. Thus, Eq. 4 be reformulated as follows:

$$\begin{aligned} \mathscr {L}_{\beta }=\mathscr {F}({\textbf {R}})-\frac{1}{2}tr(\Lambda ({\textbf {R}}^T {\textbf {R}}-{\textbf {I}}_k))+\frac{\beta }{4}\Vert {\textbf {R}}^T {\textbf {R}} -{\textbf {I}}_k\Vert _F^2. \end{aligned}$$

(5)

where $\beta$ is the penalty regulation coefficient. Then $\nabla _{{\textbf {R}}}\mathscr {L}$ can be rewritten as:

$$\begin{aligned} \nabla _{{\textbf {R}}}\mathscr {L}_{\beta }({\textbf {R}},\Lambda )=\nabla \mathscr {F}({\textbf {R}})-{\textbf {R}}\Lambda -\beta {\textbf {R}}({\textbf {I}}_k-{\textbf {R}}^T {\textbf {R}}). \end{aligned}$$

(6)

For the current iteration ${\textbf {R}}^{(v)}$, the proximal linearized augmented Lagrangian function is defined as:

$$\begin{aligned} \begin{aligned}&\left\{ \begin{array}{lr} \Lambda ^{(v)}=\Psi (\nabla \mathscr {F}({\textbf {R}}^{(v)})^T{\textbf {R}}^{(v)}),& \\ \widetilde{\mathscr {L}}_{\beta }=tr(({\textbf {G}}^T({\textbf {R}}-{\textbf {R}}^{(v)}))+\frac{\eta }{2}\Vert {\textbf {R}}-{\textbf {R}}^{(v)}\Vert _F^2,& \\ {\textbf {R}}^{(v+1)}={\arg \min }_{{\textbf {R}}}\widetilde{\mathscr {L}}_{\beta }({\textbf {R}}).& \end{array} \right. \end{aligned} \end{aligned}$$

(7)

where ${\textbf {G}}=\nabla _{{\textbf {R}}}\mathscr {L}_{\beta }({\textbf {R}}^{(v)},\Lambda ^{(v)})$, $\frac{1}{\eta }$ is the step size of gradient, $\Psi ({\textbf {A}})=\frac{{\textbf {U}}+{\textbf {U}}^T}{2}$ is used to ensure the symmetry in the first-order optimization conditions.

For redundant spherical constraints in the orthogonal feasible region, we add an additional term for $\nabla \mathscr {F}({\textbf {R}})$, i.e., $\nabla \mathscr {F}({\textbf {R}})={\textbf {R}}\Lambda +{\textbf {R}} {\textbf {D}}$, where ${\textbf {D}}$ is a diagonal matrix determined by the Lagrangian multiplier. Therefore, the iterative formula of $\Lambda ^{(v)}$ in Eq. 7 can be expressed as:

$$\begin{aligned} \Lambda ^{(v)}=\Psi (\nabla \mathscr {F}({\textbf {R}}^{(v)})^T{\textbf {R}}^{(v)})+\Xi ^{(v)}. \end{aligned}$$

(8)

where $\Xi =Diag({\textbf {R}}^T\nabla _{{\textbf {R}}}\mathscr {L}_{\beta }({\textbf {R}},\Psi (\nabla \mathscr {F}({\textbf {R}})^T{\textbf {R}})))$.

The updating rule of variable ${\textbf {R}}$ provided by Eq. 7 does not guarantee that each iteration optimization is within the orthogonal feasible region. In order to address this issue, we adopt column-wise block coordinate descent method to achieve the second objective optimization of Eq. 7. Concretely, we fix the $k-1$ columns of ${\textbf {R}}$ and use the i-th column as a variable. The sub-target of Eq. 7 can be expressed as:

$$\begin{aligned} \min \limits _{{\textbf {r}}\in \mathscr {R}^c}\widetilde{\mathscr {L}}^i_{\beta ,{\textbf {R}}}({\textbf {r}}), s.t.\; \Vert {\textbf {r}}\Vert _2=1, {\textbf {R}}_{\overline{i}}^T {\textbf {r}}=0. \end{aligned}$$

(9)

where $\widetilde{\mathscr {L}}^i_{\beta ,{\textbf {R}}}({\textbf {r}})=\widetilde{\mathscr {L}}_\beta ({\textbf {R}}_{i,r})$. The constraint ${\textbf {R}}_{\overline{i}}^T {\textbf {r}}={\textbf {0}}$ indicates the solution ${\textbf {r}}^*\in Null({\textbf {R}}_{\overline{i}})=\{{\textbf {r}}\in \mathscr {R}^c|{\textbf {R}}_{\overline{i}}^T {\textbf {r}}={\textbf {0}}\}$, which is equivalent to ${\textbf {r}}=({\textbf {I}}_c-{\textbf {R}}_{\overline{i}} {\textbf {R}}_{\overline{i}}^T){\textbf {r}}$.

Therefore, Eq. 5 can be simplified as:

$$\begin{aligned} \min \limits _{{\textbf {r}}\in \mathscr {R}^c}f_i({\textbf {r}})=\widetilde{\mathscr {L}}^i_{\beta ,{\textbf {R}}}(({\textbf {I}}_c-{\textbf {R}}_{\overline{i}} {\textbf {R}}_{\overline{i}}^T){\textbf {r}}), s.t.\; \Vert {\textbf {r}}\Vert _2=1, {\textbf {r}}\in Null({\textbf {R}}_{\overline{i}}). \end{aligned}$$

(10)

For the i-th column ${\textbf {r}}_i$, the gradient of Eq. 10 can be expressed as:

$$\begin{aligned} \nabla f_i({\textbf {r}}_i)=({\textbf {I}}_c-{\textbf {R}}_{\overline{i}} {\textbf {R}}_{\overline{i}}^T)\nabla \widetilde{\mathscr {L}}^i_{\beta ,{\textbf {R}}}(({\textbf {I}}_c-{\textbf {R}}_{\overline{i}} {\textbf {R}}_{\overline{i}}^T){\textbf {r}}_i) =({\textbf {I}}_c-{\textbf {R}}_{\overline{i}} {\textbf {R}}_{\overline{i}}^T)\nabla f_i({\textbf {r}}_i). \end{aligned}$$

(11)

From Eq. 11, it is not difficult to find: $\nabla f_i({\textbf {r}}_i)\in Null({\textbf {R}}_{\overline{i}})$. Therefore, both ${\textbf {r}}_i$ and $\nabla f_i({\textbf {r}}_i)$ all lie in $Null({\textbf {R}}_{\overline{i}})$. And we know that the null space of ${\textbf {R}}_{\overline{i}}$ is the orthocomplement of the row space of ${\textbf {R}}_{\overline{i}}$, i.e., $(Row({\textbf {R}}_{\overline{i}}))^{\bot }=Null({\textbf {R}}_{\overline{i}})$, so any point in the span of ${\textbf {r}}_i$ and $\nabla f_i({\textbf {r}}_i)$, i.e., $Span\{{\textbf {r}}_i,\nabla f_i({\textbf {r}}_i)\}$, satisfies the orthogonal constraint. Let $\widetilde{{\textbf {r}}_i}={\textbf {r}}_i -\iota \nabla f_i({\textbf {r}}_i)\in Span\{{\textbf {r}}_i,\nabla f_i({\textbf {r}}_i)\}$, and then we can get the orthogonal feasible solution:

$$\begin{aligned} {\textbf {r}}_i^{(v+1)}=\frac{{\textbf {r}}_i^v -\iota \nabla f_i({\textbf {r}}_i^v)}{\Vert {\textbf {r}}_i^v -\iota \nabla f_i({\textbf {r}}_i^v)\Vert _2}. \end{aligned}$$

(12)

where $\iota =\frac{1}{\eta }$ is the step size.

The optimization of the hash dictionary can be resolved immediately once the rotation factor ${\textbf {R}}$ is optimized. To reiterate Eq. 3, minimizing $\mathscr {F}({\textbf {B}})$ is equivalent to maximizing:

$$\begin{aligned} \max \mathscr {F}({\textbf {B}})=tr(\widetilde{{\textbf {Y}}} {\textbf {BR}}^T)=tr({\textbf {B}}^T\widetilde{{\textbf {Y}}}^T{\textbf {R}}). \end{aligned}$$

(13)

By utilizing the property of inequality, the concise and closed form of the hash dictionary can be derived:

$$\begin{aligned} {\textbf {B}}=sgn(\widetilde{{\textbf {Y}}}^T{\textbf {R}}). \end{aligned}$$

(14)

Unseen signal hash representation

The establishment of communication between the primitive space and Hamming space necessitates the utilization of an explicit hash function, which serves as a practical tool for generating hash codes for previously unseen signal data. For simplicity, the most straightforward linear hash function is used to establish links.

We minimize the following square loss:

$$\begin{aligned} \mathscr {H}({\textbf {P}})=\Vert {\textbf {B}}^T-{\textbf {P}} {\textbf {X}}\Vert _F^2+\lambda \Vert {\textbf {P}}\Vert _F^2. \end{aligned}$$

(15)

where $\lambda$ is a regularization parameter, which is used to prevent over learning. To minimize Eq. 15, we can get ${\textbf {P}}={\textbf {B}}^T{\textbf {X}}^T({\textbf {X}} {\textbf {X}}^T+\lambda {\textbf {I}}_d)^{-1}$. Confronted with an unobserved signal dataset ${\textbf {X}}_{te}\in \mathscr {R}^{d\times m}$, we can efficiently produce its hash representation ${\textbf {B}}_{te}^T=sgn({\textbf {P}} {\textbf {X}}_{te})$ using this linear hashing function.

The integration of the training hash dictionary and the unseen hash representation enables the acquisition of the global hash representation ${\textbf {B}}_{g}^T=[{\textbf {B}}^T, {\textbf {B}}_{te}^T]$. The global graph similarity can be computed by evaluating the Hamming similarity of each hash code. The more intuitive procedure is summarized in Algorithm 1.

Dynamic similarity graph

The aforementioned semantic hash similarity graph can be derived expeditiously from the label space. However, it fails to consider the structural interconnections among the original signals. Hence, it becomes imperative to amalgamate the structural similarity in order to construct the ultimate analogous structure. The previous approaches for constructing similar matrices typically rely on inner product or Euclidean distance, which are not suitable for paired time series due to the potential cascade effect between the series leading to temporal lag. The Dynamic Time Warping distance (DTW)⁵⁰ is a robust measure of signal similarity that accounts for temporal variations. Therefore, we utilize the DTW distance to construct a structure analogous to the original signal.

Formally, given any two signal sequences ${\textbf {x}}_{i}, {\textbf {x}}_{j}$, their similarity ${\textbf {A}}_D^{ij}$ is computed utilizing the unbounded distance of DTW.

$$\begin{aligned} {\textbf {A}}_D^{ij}=exp\left( -\frac{DTW({\textbf {x}}_{i}, {\textbf {x}}_{j},\omega )}{\epsilon }\right) . \end{aligned}$$

(16)

where $\epsilon$ is a hyper-parameter, $\omega$ is local constraint window size, and ${\textbf {A}}_D$ is referred to as dynamic similarity graph. The ultimate global graph structure is formed by the weighted summation of the semantic similarity graph and the dynamic structural similarity graph, i.e., ${\textbf {A}}=\kappa {\textbf {A}}_D+(1-\kappa ){\textbf {A}}_H$. The above hyper-parameters are set to $\epsilon =0.5$, $\omega =3$ and $\kappa =0.3$ in order to minimize parameter interdependence and reduce algorithm debugging time.

Convergence analysis and complexity discussion

Next we discuss the convergence of Algorithms 1. The iteration of two variables is involved in Algorithm 1, where the iteration of the hash dictionary does not play a crucial role in minimizing the objective. This is because learning the hash dictionary essentially involves selecting the best vertex from a k-dimensional hypercube vertex. Therefore, we can only focus on discussing the iterative convergence of the rotation factor ${\textbf {R}}$. The experimental section provides empirical validation while the appendix presents theoretical analysis.

The complexity analysis of Algorithm 1 is outlined below. Initializing ${\textbf {R}}$ and ${\textbf {B}}$ takes $\mathscr {O}(nk+ck)$. Each iteration to update ${\textbf {R}}$ and ${\textbf {B}}$ requires calculation costs of $\mathscr {O}(ck+4ck^2)$ and $\mathscr {O}(cnk)$, respectively. Calculating ${\textbf {P}}$ needs $\mathscr {O}(nd^2+d^3+knd+kd^2)$. The generation of ${\textbf {B}}_{te}$ requires $\mathscr {O}(kmd)$. The computation cost for the ultimate generation of the global semantic graph structure ${\textbf {A}}$ amounts to $\mathscr {O}((m+n)^2)$.

Experiments and discussions

To validate the effectiveness of the constructed hash similarity graph, a deep graph model framework is required. In this study, we employ the fast CNG model proposed by Yang et al.⁵¹. The FastGCN converts the set of graph vertices into independent and identically distributed (i.i.d.) samples drawn from a probability distribution, thereby facilitating the uniform estimation of the loss gradient for parameter updates. Following this, the full GCN architecture is utilized to compute embeddings for the newly introduced vertices.

Experiment setting

(1)
Datasets:
- $\bullet$ MIT-BIH arrhythmia database⁵²: The dataset was continuously recorded for a duration of 30 minutes, with a sampling rate of 360 samples per second. Each cycle within the dataset was annotated with a reference value based on the R-peak, serving as an analytical ground truth. For this study, a subset of the dataset consisting of 10,000 ECG fragments from II leads was selected and categorized into four different classes: normal heartbeat (N), left bundle branch block heartbeat (L), right bundle branch block heartbeat (R), and ventricular premature beat (V). A training set comprising 8,000 randomly chosen samples was created, including 7,000 training instances and 1,000 validation instances. The remaining samples were allocated to the test set.
- $\bullet$ The AF ECG dataset¹²: It was obtained from the 2017 PhysioNet/CinC Challenge and comprised of 8528 ECG segments ranging in duration from 30 to 60 seconds. This dataset was categorized into four groups: normal sinus rhythm (N), atrial fibrillation (A), other cardiac rhythm (O), and noise segment ($\sim$). For training purposes, we randomly selected 7000 sample signals while the remaining were used as a test set. Additionally, we extracted 500 sample signals from the training set for validation.
(2)
Compared methods: The detection performance of the constructed semantic hash similarity graph in the FastGCN network is verified by conducting experimental comparisons with several commonly used intelligent algorithms. These algorithms include machine learning methods such as long short-term memory based on CNN (CNN-LSTM)⁵³, bidirectional LSTM (BiLSTM)⁵⁴, one-dimensional convolutional neural networks (1DCNN)³, attentive recurrent neural networks (ARNN)⁴, GCN⁵⁵, CNN-BiLSTM-ATT², ST-ReGE¹⁹ and MPR-STSGCN¹⁰.
Fig. 2
The alternative text for this image may have been generated using AI.
Full size image
Comparison results of all methods for each class of ACC on the dataset MIT.
Fig. 3
The alternative text for this image may have been generated using AI.
Full size image
Comparison results of all methods for each class of ACC on the dataset AF.
Fig. 4
The alternative text for this image may have been generated using AI.
Full size image
Convergence of algorithm 1 on two datasets.
Fig. 5
The alternative text for this image may have been generated using AI.
Full size image
Comparison results between $\text {FastGCN}_{H}$ and $\text {FastGCN}_{D}$ on two datasets.
Fig. 6
The alternative text for this image may have been generated using AI.
Full size image
The visualization comparison of semantic similarity graphs and label similarity graphs on MIT.
(3)
Metrics: The metrics employed in our study include accuracy for each category, average accuracy (Acc), Precision, Recall (sensitivity), and Macro-F1 score based on the AAMI guidelines. The following metrics are presented:
$$\begin{aligned} \left\{ \begin{array}{lr} Acc=\frac{TP+TN}{TP+FP+FN+TN}, \\ Precision=\frac{TP}{TP+FP}, \\ Recall=\frac{TP}{TP+FN}, \\ F_{1,ma}=\frac{2Precision*Recall}{Precision+Recall}=\frac{2TP}{2TP+FP+FN}. \end{array} \right. \end{aligned}$$
(17)
where the abbreviations TP, FP, TN, and FN respectively stand for true positive, false positive, true negative, and false negative.
(4)
Experimental protocol: For Algorithm 1, it involves three parameters $\beta$, $\iota$ and $\lambda$, which are set to $\min \left( \max (10^{-2}, 10^{-3}\Vert {\textbf {B}}^T{\textbf {B}}\Vert _F^2),10^5\right)$, $1e-4$ and $1e-1$ respectively. The hash code length k is set to 32. The number of epochs for FastGCN is set to 30, the number of hidden units is set to 256, the ADAM learning rate is set to $10^{-2}$, the batch size is set to 200, and the resample size is set to 400. Since FastGCN has only two layers, we kernelize the samples in advance to improve its nonlinearity. In addition, to facilitate the reproducibility of the experiment in FastGCN, we provide two datasets along with their corresponding semantic hash graphs (https://pan.baidu.com/s/19-APd8zAYim2hBr6YvCutg?pwd=7neo). All experiments are carefully performed on a workstation with Intel(R) Core(TM) i7-9700 CPU@3.00GHz, NVIDIA GeForce RTX 3070, 64 GB RAM.

Analysis of experimental results

(1)
MIT-BIH: The Acc, Precision, Recall and F1 score results of all baselines on MIT-BIH are listed in Table 1. The comparison of Acc results for all baselines in each category is shown in Fig. 2. The feedback from the results indicates that the category features in this dataset are notably discernible, and all methods exhibit relatively high detection performance. The superiority of our method is evident in terms of Acc, precision, and F1 score compared to other methods. The value of recall is slightly weaker compared to ARNN. All the benchmarks are based on deep learning methodologies. The results indicate that all approaches have achieved satisfactory outcomes, suggesting that the deep learning architecture exhibits superior performance in extracting critical signal features. However, it should be noted that deep networks require longer training time. The ARNN model, in addition to achieving the second highest performance among all methods, suggests that employing an attention mechanism in time-series networks is more appropriate for analyzing ECG signals. The FastGCN structure, despite its mere two layers, outperforms other baselines in terms of recognition results. This indirectly suggests that the semantic hash similarity graph we constructed is abundant in semantic information, which greatly facilitates the efficient detection of the FastGCN model.
Table 1 Comparison results of various methods on two data sets. bold text indicates the best and italic text indicates the second best result, respectively.
Full size table

(2)
AF dataset: The baseline comparison results have been summarized in Table 1, while Fig. 3 displays the accuracy (ACC) for each type of detection. The results presented in Table 1 demonstrate that our method outperforms all baselines in terms of ACC, recall, and F1 scores over a wide range of intervals. This also demonstrates the effectiveness of the proposed semantic hash graph learning framework in mitigating the inherent rigidity within the label space, as well as addressing issues pertaining to incomplete or redundant adjacency within the similarity graph. However, it should be noted that the precision results are comparatively weaker. The absence of an attention module in FastGCN model may contribute to this phenomenon, potentially indicating a class imbalance.

The results of each baseline method in each type of detection vary significantly due to the imbalance in the number of different samples in this dataset, as illustrated by Fig. 2. Among these variations, the recognition rate for normal rhythm (N) and atrial fibrillation (A) is the highest, while the recognition rate for the other two types is unsatisfactory. Our method performs better in the first and fourth categories of the recognition results, while CNN-BiLSTM-ATT significantly leads in the second category of the detection results; however, its performance in the fourth category of the detection results is not satisfactory. The results from the graph show that our method, as well as ST-ReGE and MPR-STSGCN, have relatively high recognition accuracy in each category, indicating that these three methods are not sensitive to class imbalance, as the dataset is highly imbalanced in terms of categories. The potential reason is that the parameters of the attention module are updated solely through loss backpropagation from labels and predicted values, without any additional supervision information being introduced. Consequently, its supervision is constrained, rendering it susceptible to label overfitting.

Convergency analysis and training time

The convergence of algorithm 1 on two datasets is investigated, and the results are depicted in Fig. 4. A remarkably rapid convergence rate (less than 5 iterations) is observed, primarily attributed to the extraction of hash books from tag space rather than semantic associations. Despite the increased complexity of orthogonal rotation factor optimization, its volume remains minimal, resulting in a highly expedited optimization speed. The convergence diagram on MIT reveals that, in addition, the objective function does not exhibit a genuine decrease in value. The primary reason is that updating the hash code is crucial for selecting the optimal vertex on the K-dimensional hypercube network, and it does not positively impact the minimization of the objective function value; in fact, it may even have a detrimental effect. Furthermore, rotation factor ${\textbf {R}}$ has a significantly smaller volume compared to ${\textbf {B}}$, resulting in its contribution to the target value being much less significant than that of ${\textbf {B}}$. Therefore, we only need to ensure that the algorithm can converge normally. In the appendix, we give a strict proof that updating the rotation factor ${\textbf {R}}$ can guarantee the reduction of the target value.

The training time and retrieval time at all baselines are also examined in our investigation. To be fair, the epochs number for each network is set to 30. The training time and retrieval time for all baselines are listed in Table 2, revealing that our method outperforms other deep learning approaches. It is noteworthy that the construction of a global similarity graph by combining semantic hash similarity only requires a few seconds, which falls within an acceptable timeframe.

Table 2 Comparison of training time and retrieval time on two data sets. bold text indicates the best and italic text indicates the second best result, respectively.

Full size table

Ablation analysis

The contribution of the semantic hash similarity graph to the experimental performance is further analyzed through ablation experiments conducted on two datasets, providing deeper insights into the module. The abbreviated symbols $\text {FastGCN}_{H}$ and $\text {FastGCN}_{D}$ are utilized for the purpose of expressing the combined semantic hash graph and its excised version more conveniently. The $\text {FastGCN}_{D}$ version is obtained when $\kappa =1$ in our method.

The experimental comparison results of the two methods on the two datasets are presented in Fig. 5. The findings clearly demonstrate that the incorporation of semantic hash graph significantly enhances the performance of FastGCN, thereby confirming the presence of abundant semantic information in the learned graph for effective class separation.

To make it more intuitive, we selected the first 100 signals from each of the two datasets and compared their semantic similarity graphs and label similarity graphs, as shown in Figs. 6 and 7. The extracted semantic similarity graphs fit the label similarities very well.

Parameter sensitivity

The above experiments are conducted after preconfiguring all parameters. Subsequently, we examine the impact of parameter variations on the outcomes using two original datasets. The focus of our research lies not in extensive exploration of network parameters, but rather in examining the impact of these parameters within the generated semantic hash graph model on the outcomes.

The weight parameter $\kappa$ between semantic similarity graph and dynamic structural similarity graph exerts the most significant impact on the outcome. The impact of varying this parameter on the outcome was investigated in our experiment, and the results are illustrated in Fig. 8. The results depicted in the figure indicate that it is more suitable to choose the interval [0.1, 0.5] for parameter $\kappa$ on data set MIT-BIH , while selecting the interval [0.1, 0.6] on data set AF would be more appropriate.

For penalty parameter $\beta$, we have set a general range of adjustments, which we will not investigate further here. Based on the experimental findings, it was conclusively determined that the variation of $\iota$ has no impact on the experimental results. Instead, the results exhibited remarkable stability throughout the experiment. However, considering the convergence of the algorithm, it is better to choose a smaller step size as far as possible, such as $\iota \le 1e-3$. The aforementioned two parameters solely impact the algorithm’s convergence speed, without significantly affecting the outcome.

The regularization parameter $\lambda$ is solely utilized to prevent the linear projection function from overfitting. All experiments in this paper are based on $\lambda =1e-1$, but from the sensitivity experiment of $\lambda$ in Fig. 9, it can be seen that taking $\lambda =2e-1$ is the best choice on the two datasets.

The length of the hash code plays a crucial role in capturing signal semantics. A longer hash code length typically encompasses richer semantic information. As illustrated in Fig. 9, increasing the hash code length k enhances the semantic similarity graph, which in turn significantly boosts the retrieval performance of the FastGCN network. However, longer hash codes come at the cost of increased computational complexity. Despite Fig. 9 indicating that $k=128$ yields the optimal results, this study opts for $k=32$ to strike a balance between retrieval effectiveness and computational efficiency.

Noise sensitivity

Furthermore, to verify the noise resistance of our graph construction method, we introduce Gaussian white noise with a signal-to-noise ratio (SNR) of 10 into each of the two datasets. Subsequently, we conduct a comparative analysis of the original graphs constructed using the KNN method and our proposed method within the FastGCN framework. Additionally, experimental comparisons are performed on FastGCN for the two graph structures built based on Gaussian white noise. The experimental results are shown in Fig. 10. Our results dropped by 3% under noise, while KNN-based results decreased by 11% on MIT-BIH. And Our results experienced a 3-percentage-point drop under noisy conditions, whereas the KNN-based results exhibited a 12-percentage-point decrease on AF. The results demonstrate that our graph learning method possesses the capability of noise mitigation. This is primarily attributed to the fact that hash coding can effectively diminish the impact of noise.

Conclusion

Aiming at the dual dilemma that the learning of existing graph structures over-relies on local topological relations while ignoring global semantic similarity, and that it is difficult to effectively capture node association features in sparse data scenarios, This paper proposes a Semantic Hashing Similarity Graph (SHSG) learning framework based on supervisory mechanism. The framework systematically models the internal feature association and cross-sample semantic similarity of ECG signals through multi-level semantic fusion strategy, and provides a strong discriminant initial graph structure for depth map neural networks. SHSG first constructs a compact hash representation of supervised signals based on semantic consistency constraints of tag supervision. Secondly, a lightweight linear hash function is designed to generate the generalization representation of the unobserved signal. Then, the hash space embedding of the training set and test set samples is integrated to construct a globally traceable hash dictionary. Finally, based on Hamming similarity, the graph topology is generated. In order to verify the effectiveness of the proposed method, experiments are carried out on the double-layer FastGCN architecture. The experimental results verify the dual advantages of the proposed method in terms of feature characterization ability and computational efficiency.

Next, we will focus on the construction of unsupervised semantic similarity graphs and combine them with more advanced graph neural network (GNN) architectures, such as graph attention networks and graph isomorphism networks, to further enhance feature extraction and learning capabilities.

Data Availability

Data is provided within the manuscript or supplementary information files

References

Morokuma, S. et al. Prediction of ecg signals from ballistocardiography using deep learning for the unconstrained measurement of heartbeat intervals. Sci. Rep. 15, 999 (2025).
Article CAS PubMed PubMed Central Google Scholar
Lin, H. et al. A new method for heart rate prediction based on lstm-bilstm-att. Measurement 207, 112384. https://doi.org/10.1016/j.measurement.2022.112384 (2023).
Article Google Scholar
Sabor, N., Gendy, G., Mohammed, H., Wang, G. & Lian, Y. Robust arrhythmia classification based on qrs detection and a compact 1d-cnn for wearable ecg devices. IEEE J. Biomed. Health Inform. 26, 5918–5929. https://doi.org/10.1109/JBHI.2022.3207456 (2022).
Article PubMed Google Scholar
Prabhakararao, E. & Dandapat, S. Attentive rnn-based network to fuse 12-lead ecg and clinical features for improved myocardial infarction diagnosis. IEEE Signal Process. Lett. 27, 2029–2033. https://doi.org/10.1109/LSP.2020.3036314 (2020).
Article ADS Google Scholar
Li, W., Tang, Y. M., Yu, K. M. & To, S. Slc-gan: An automated myocardial infarction detection model based on generative adversarial networks and convolutional neural networks with single-lead electrocardiogram synthesis. Inf. Sci. 589, 738–750. https://doi.org/10.1016/j.ins.2021.12.083 (2022).
Article Google Scholar
Dohare, A. K., Kumar, V. & Kumar, R. Detection of myocardial infarction in 12 lead ecg using support vector machine. Appl. Soft Comput. 64, 138–147. https://doi.org/10.1016/j.asoc.2017.12.001 (2018).
Article Google Scholar
Arora, H. & Sinha, R. Multi-modal graph neural networks for physiological signal analysis. In 2025 4th International Conference on Sentiment Analysis and Deep Learning (ICSADL), 1202–1208. https://doi.org/10.1109/ICSADL65848.2025.10933022 (2025).
Shiyilin, Y., Jie, S., Xin, Y., Xin, C. & Xingxing, W. Ecg arrhythmias classification with a graph bispectrum method. In 2023 International Conference on Intelligent Supercomputing and BioPharma (ISBP), 104–108. https://doi.org/10.1109/ISBP57705.2023.10061314 (2023).
Chen, J. et al. Graph-enhanced low-resource ecg representation learning for emotion recognition based on wearable internet of things. IEEE Internet Things J. 11, 39056–39068. https://doi.org/10.1109/JIOT.2024.3430297 (2024).
Article Google Scholar
Chen, Y., Qiu, S., Wang, Z., Zhao, H. & Cao, X. Multiperceptive region of spatial temporal graph convolutional shrinkage network for arrhythmia recognition. IEEE Trans. Instrum. Meas. 73, 1–11. https://doi.org/10.1109/TIM.2024.3376017 (2024).
Article CAS Google Scholar
Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M. & Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 20, 61–80. https://doi.org/10.1109/TNN.2008.2005605 (2009).
Article PubMed Google Scholar
Clifford, G. D. et al. Af classification from a short single lead ecg recording: The physionet/computing in cardiology challenge 2017. In 2017 Computing in Cardiology (CinC), 1–4. https://doi.org/10.22489/CinC.2017.065-469 (2017).
Zhang, D., Wu, X.-J. & Yu, J. Discrete bidirectional matrix factorization hashing for zero-shot cross-media retrieval. In Pattern Recognition and Computer Vision (eds Ma, H. et al.) 524–536 (Springer, Cham, 2021).
Google Scholar
Gao, X., Chen, Z., Zhang, B. & Wei, J. Deep learning to hash with application to cross-view nearest neighbor search. IEEE Trans. Circuits Syst. Video Technol. 35, 3882–3892. https://doi.org/10.1109/TCSVT.2023.3273400 (2025).
Article Google Scholar
Zhang, D. & Wu, X.-J. Robust and discrete matrix factorization hashing for cross-modal retrieval. Pattern Recognit. 122, 108343. https://doi.org/10.1016/j.patcog.2021.108343 (2022).
Article Google Scholar
Zhang, D., Wu, X.-J. & Yu, J. Label consistent flexible matrix factorization hashing for efficient cross-modal retrieval. ACM Trans. Multimed. Comput. Commun. Appl. https://doi.org/10.1145/3446774 (2021).
Article Google Scholar
Zhang, B. et al. Unsupervised dual deep hashing with semantic-index and content-code for cross-modal retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 47, 387–399. https://doi.org/10.1109/TPAMI.2024.3467130 (2025).
Article PubMed Google Scholar
Zhang, D. & Wu, X.-J. Scalable discrete matrix factorization and semantic autoencoder for cross-media retrieval. IEEE Trans. Cybern. 52, 5947–5960. https://doi.org/10.1109/TCYB.2020.3032017 (2022).
Article PubMed Google Scholar
Zhang, H. et al. St-rege: A novel spatial-temporal residual graph convolutional network for cvd. IEEE J. Biomed. Health Inform. 28, 216–227. https://doi.org/10.1109/JBHI.2023.3327025 (2024).
Article Google Scholar
Iqbal, S. et al. Fusiongcnn: An iot-based novel spatiotemporal graph convolutional network for ecg arrhythmia detection. IEEE Internet Things J. https://doi.org/10.1109/JIOT.2025.3560344 (2025).
Article Google Scholar
Zhu, Y. et al. A survey on graph structure learning: Progress and opportunities (2021). arXiv:2103.03036.
Shen, Z., Wang, S. & Kang, Z. Beyond redundancy: Information-aware unsupervised multiplex graph structure learning (2024). arXiv:2409.17386.
Shen, Z. & Kang, Z. When heterophily meets heterogeneous graphs: Latent graphs guided unsupervised representation learning (2024). arXiv:2409.00687.
Chen, Y., Wu, L. & Zaki, M. Iterative deep graph learning for graph neural networks: Better and robust node embeddings. In Advances in Neural Information Processing Systems Vol. 33 (eds Larochelle, H. et al.) 19314–19326 (Curran Associates Inc, 2020).
Google Scholar
Li, M. et al. Gsgsl: Gravity-driven self-supervised graph structure learning. Inf.Process. Manag. 61, 103744. https://doi.org/10.1016/j.ipm.2024.103744 (2024).
Article Google Scholar
Liu, F. & Liu, W. Graph neural network based multi-instance learning with graph structure learning. In 2024 7th International Conference on Artificial Intelligence and Big Data (ICAIBD), 505–510. https://doi.org/10.1109/ICAIBD62003.2024.10604509 (2024).
Preparata, F. P. & Shamos, M. I. Computational Geometry: An Introduction (Springer, 1985).
Book Google Scholar
Bentley, J. L., Stanat, D. F. & Williams, E. The complexity of finding fixed-radius near neighbors. Inf. Process. Lett. 6, 209–212. https://doi.org/10.1016/0020-0190(77)90070-9 (1977).
Article MathSciNet Google Scholar
Zhu, Y., Xu, Y., Yu, F., Wu, S. & Wang, L. Cagnn: Cluster-aware graph neural networks for unsupervised graph representation learning (2020). arXiv:2009.01674.
Louizos, C., Welling, M. & Kingma, D. P. Learning sparse neural networks through $l_0$ regularization (2018). arXiv:1712.01312.
Ortega, A., Frossard, P., Kovacevic, J., Moura, J. M. F. & Vandergheynst, P. Graph signal processing: Overview, challenges, and applications. Proc. IEEE 106, 808–828. https://doi.org/10.1109/JPROC.2018.2820126 (2018).
Article Google Scholar
Cai, J.-F., Candès, E. J. & Shen, Z. A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20, 1956–1982. https://doi.org/10.1137/080738970 (2010).
Article MathSciNet Google Scholar
Cai, D., He, X., Han, J. & Huang, T. S. Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 33, 1548–1560. https://doi.org/10.1109/TPAMI.2010.231 (2011).
Article PubMed Google Scholar
Fan, F., Jing, P., Nie, L., Gu, H. & Su, Y. Sadcmf: Self-attentive deep consistent matrix factorization for micro-video multi-label classification. IEEE Trans. Multimed. 26, 10331–10341. https://doi.org/10.1109/TMM.2024.3406196 (2024).
Article Google Scholar
Fan, F. et al. Dual-domain aligned deep hierarchical matrix factorization method for micro-video multi-label classification. IEEE Trans. Multimed. 26, 2598–2607. https://doi.org/10.1109/TMM.2023.3301224 (2024).
Article Google Scholar
Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems Vol. 30 (eds Guyon, I. et al.) (Curran Associates Inc, 2017).
Google Scholar
Kreuzer, D., Beaini, D., Hamilton, W. L., L’etourneau, V. & Tossou, P. Rethinking graph transformers with spectral attention. arXiv:abs/2106.03893 (2021).
Dwivedi, V. P., Luu, A. T., Laurent, T., Bengio, Y. & Bresson, X. Graph neural networks with learnable structural and positional representations. arXiv:abs/2110.07875 (2021).
Lu, J., Zhou, J., Chen, Y., Pedrycz, W. & Hung, K.-W. Asymmetric transfer hashing with adaptive bipartite graph learning. IEEE Trans. Cybern. 54, 533–545. https://doi.org/10.1109/TCYB.2022.3232787 (2024).
Article PubMed Google Scholar
Jing, P., Sun, H., Nie, L., Li, Y. & Su, Y. Deep multi-modal hashing with semantic enhancement for multi-label micro-video retrieval. IEEE Trans. Knowl. Data Eng. 36, 5080–5091. https://doi.org/10.1109/TKDE.2023.3337077 (2024).
Article Google Scholar
Huang, L., Tao, Y., Qin, C. & Zhang, X. Robust hashing for neural network models via heterogeneous graph representation. IEEE Signal Process. Lett. 31, 2640–2644. https://doi.org/10.1109/LSP.2024.3465898 (2024).
Article Google Scholar
Liu, H., Wei, Y., Yin, J. & Nie, L. Hs-gcn: Hamming spatial graph convolutional networks for recommendation. IEEE Trans. Knowl. Data Eng. 35, 5977–5990. https://doi.org/10.1109/TKDE.2022.3158317 (2023).
Article Google Scholar
Wang, S., Li, C. & Shen, H.-L. Distributed graph hashing. IEEE Trans. Cybern. 49, 1896–1908. https://doi.org/10.1109/TCYB.2018.2816791 (2019).
Article PubMed Google Scholar
Wang, Y. et al. Batch: A scalable asymmetric discrete cross-modal hashing. IEEE Transactions on Knowledge and Data Engineering 33, 3507–3519. https://doi.org/10.1109/TKDE.2020.2974825 (2021).
Article Google Scholar
Wang, Y., Chen, Z.-D., Luo, X. & Xu, X.-S. A high-dimensional sparse hashing framework for cross-modal retrieval. IEEE Trans. Circuits Syst. Video Technol. 32, 8822–8836. https://doi.org/10.1109/TCSVT.2022.3195874 (2022).
Article Google Scholar
Wang, L., Zareapoor, M., Yang, J. & Zheng, Z. Asymmetric correlation quantization hashing for cross-modal retrieval. IEEE Trans. Multimed. 24, 3665–3678. https://doi.org/10.1109/TMM.2021.3105824 (2022).
Article Google Scholar
Yang, F. et al. Semantic preserving asymmetric discrete hashing for cross-modal retrieval. Appl. Intell. 53, 15352–15371. https://doi.org/10.1007/s10489-022-04282-w (2023).
Article Google Scholar
Gao, B., Liu, X. & Xiang Yuan, Y. Parallelizable algorithms for optimization problems with orthogonality constraints. SIAM J. Sci. Comput. 41, 1949–1983 (2018).
Article MathSciNet Google Scholar
Gao, B., Liu, X., Chen, X. & Yuan, Y.-X. A new first-order algorithmic framework for optimization problems with orthogonality constraints. SIAM J. Optim. 28, 302–332. https://doi.org/10.1137/16M1098759 (2018).
Article MathSciNet Google Scholar
Sakoe, H. & Chiba, S. Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26, 43–49. https://doi.org/10.1109/TASSP.1978.1163055 (1978).
Article Google Scholar
Chen, J., Ma, T. & Xiao, C. Fastgcn: Fast learning with graph convolutional networks via importance sampling. CoRR abs/1801.10247 (2018). arXiv:1801.10247.
Yang, W., Si, Y., Wang, D. & Gong, Z. A novel method for identifying electrocardiograms using an independent component analysis and principal component analysis network. Measurement 152, 107363. https://doi.org/10.1016/j.measurement.2019.107363 (2019).
Article Google Scholar
Chen, C., Hua, Z., Zhang, R., Liu, G. & Wen, W. Automated arrhythmia classification based on a combination network of cnn and lstm. Biomed. Signal Process. Control 57, 101819. https://doi.org/10.1016/j.bspc.2019.101819 (2020).
Article Google Scholar
Yildirim, Özal. A novel wavelet sequence based on deep bidirectional lstm network model for ecg signal classification. Comput. Biol. Med. 96, 189–202. https://doi.org/10.1016/j.compbiomed.2018.03.016 (2018).
Article PubMed Google Scholar
Jiang, Z. et al. Diagnostic of multiple cardiac disorders from 12-lead ecgs using graph convolutional network based multi-label classification. In 2020 Computing in Cardiology, 1–4. https://doi.org/10.22489/CinC.2020.135 (2020).

Download references

Acknowledgements

This work was partially supported by the National Science Foundation of China (62271293), the Natural Science Foundation of Shandong Province, PR China (ZR2021MF035), the Social Science Planning Project of Shandong Province, PR China (22CYYJ13).

Author information

Y. Fang and Y. Ren have contributed equally to this work.

Authors and Affiliations

School of Information Engineering, Shandong Management University, Jinan, 250357, China
Yixian Fang
School of Information Science and Engineering (Institute of Data Science and Technology), Shandong Normal University, Jinan, 250014, China
Shilin Zhang & Yuwei Ren

Authors

Yixian Fang
View author publications
Search author on:PubMed Google Scholar
Shilin Zhang
View author publications
Search author on:PubMed Google Scholar
Yuwei Ren
View author publications
Search author on:PubMed Google Scholar

Contributions

Yixian Fang and Shilin Zhang: concept and design of the study. Yixian Fang wrote the manuscript. Yuwei Ren: data collection and pre-analysis. Yixian Fang and Yuwei Ren: funding acquisition, supervision, Writing - review & editing. All authors reviewed the manuscript.

Corresponding author

Correspondence to Yixian Fang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information. (download ZIP )

Analysis of algorithm 1 convergence

The purpose of updating hash codes is to select the optimal vertex on a k-dimensional hypercube network, which does not significantly impact target descent. Therefore, our main focus is on discussing the convergence of rotation factor ${\textbf {R}}$.

Considering that the orthogonal feasible solution $\widetilde{{\textbf {r}}_i}={\textbf {r}}_i -\iota \nabla f_i({\textbf {r}}_i)\in Span\{{\textbf {r}}_i,\nabla f_i({\textbf {r}}_i)\}$, which is equivalent to $\widetilde{{\textbf {r}}_i}\in \mathscr {B}({\textbf {r}}_i -\iota \nabla f_i({\textbf {r}}_i),\iota \Vert \nabla f_i({\textbf {r}}_i)\Vert _2)$. On this basis, we can derive the following theorem.

Theorem 1

For any $\widetilde{{\textbf {r}}_i}\in \mathscr {B}({\textbf {r}}_i -\iota \nabla f_i({\textbf {r}}_i),\iota \Vert \nabla f_i({\textbf {r}}_i)\Vert _2)$, there holds that

$$\begin{aligned} f_i(\widetilde{{\textbf {r}}_i})\le f_i({\textbf {r}}_i). \end{aligned}$$

(18)

Proof

For any $\widetilde{{\textbf {r}}_i}\in \mathscr {B}({\textbf {r}}_i -\iota \nabla f_i({\textbf {r}}_i),\iota \Vert \nabla f_i({\textbf {r}}_i)\Vert _2)$, we have

$$\begin{aligned} \langle \widetilde{{\textbf {r}}_i}-{\textbf {r}}_i, \widetilde{{\textbf {r}}_i}-{\textbf {r}}_i+2\iota \nabla f_i({\textbf {r}}_i)\rangle \le 0. \end{aligned}$$

(19)

That is because

$$\begin{aligned}&(\widetilde{{\textbf {r}}_i}-{\textbf {r}}_i)^T(\widetilde{{\textbf {r}}_i}-{\textbf {r}}_i+2\iota \nabla f_i({\textbf {r}}_i)) \nonumber \\&\quad =\Vert \widetilde{{\textbf {r}}_i}-{\textbf {r}}_i\Vert _2^2+2\iota (\widetilde{{\textbf {r}}_i}-{\textbf {r}}_i)^T\nabla f_i({\textbf {r}}_i) \nonumber \\&\quad \le \Vert \widetilde{{\textbf {r}}_i}-{\textbf {r}}_i\Vert _2^2+2\iota (-\iota \nabla f_i({\textbf {r}}_i))^T\nabla f_i({\textbf {r}}_i) \nonumber \\&\quad =\Vert \widetilde{{\textbf {r}}_i}-{\textbf {r}}_i\Vert _2^2-2\iota ^2\Vert \nabla f_i({\textbf {r}}_i)\Vert _2^2 \nonumber \\&\quad \le \iota ^2\Vert \nabla f_i({\textbf {r}}_i)\Vert _2^2-2\iota ^2\Vert \nabla f_i({\textbf {r}}_i)\Vert _2^2\le 0. \end{aligned}$$

(20)

Using the second order Taylor expansion of $f_i$, we have

$$\begin{aligned} f_i(\widetilde{{\textbf {r}}_i})= f_i({\textbf {r}}_i)+\langle \widetilde{{\textbf {r}}_i}-{\textbf {r}}_i, \nabla f_i({\textbf {r}}_i)\rangle +\frac{1}{2}(\widetilde{{\textbf {r}}_i}-{\textbf {r}}_i)^T\nabla ^2 f_i({\textbf {r}}_i)(\widetilde{{\textbf {r}}_i}-{\textbf {r}}_i) \end{aligned}$$

(21)

Since $f_i$ is a convex function of ${\textbf {r}}_i$, the Hessian matrix $\nabla ^2 f_i({\textbf {r}}_i)$ is symmetrically positive definite, and then there exists an invertible matrix ${\textbf {Q}}$ that satisfies $\nabla ^2 f_i({\textbf {r}}_i)={\textbf {Q}}^T{\textbf {Q}}$. So $(\widetilde{{\textbf {r}}_i}-{\textbf {r}}_i)^T\nabla ^2 f_i({\textbf {r}}_i)(\widetilde{{\textbf {r}}_i}-{\textbf {r}}_i)=\Vert {\textbf {Q}}(\widetilde{{\textbf {r}}_i}-{\textbf {r}}_i)\Vert _2^2\le \rho \Vert \widetilde{{\textbf {r}}_i}-{\textbf {r}}_i\Vert _2^2$, where $\rho =\Vert {\textbf {Q}}\Vert _F^2$. Therefore, in combination with Eqs.(23) (25), we have

$$\begin{aligned} f_i(\widetilde{{\textbf {r}}_i})&\le f_i({\textbf {r}}_i)+\langle \widetilde{{\textbf {r}}_i}-{\textbf {r}}_i, \nabla f_i({\textbf {r}}_i)\rangle +\frac{\rho }{2}\Vert \widetilde{{\textbf {r}}_i}-{\textbf {r}}_i\Vert _2^2 \nonumber \\&=f_i({\textbf {r}}_i)+\frac{1}{2\iota }\langle \widetilde{{\textbf {r}}_i}-{\textbf {r}}_i, \widetilde{{\textbf {r}}_i}-{\textbf {r}}_i+2\iota \nabla f_i({\textbf {r}}_i)\rangle -\frac{\iota ^{-1}-\rho }{2}\Vert \widetilde{{\textbf {r}}_i}-{\textbf {r}}_i\Vert _2^2 \nonumber \\&\le f_i({\textbf {r}}_i)-\frac{\iota ^{-1}-\rho }{2}\Vert \widetilde{{\textbf {r}}_i}-{\textbf {r}}_i\Vert _2^2. \end{aligned}$$

(22)

So as long as the step size $\iota \in (0, \rho ^{-1})$, we can derive $f_i(\widetilde{{\textbf {r}}_i})\le f_i({\textbf {r}}_i)$.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Fang, Y., Zhang, S. & Ren, Y. Semantic ECG hash similarity graph. Sci Rep 15, 23791 (2025). https://doi.org/10.1038/s41598-025-07838-1

Download citation

Received: 17 February 2025
Accepted: 17 June 2025
Published: 03 July 2025
Version of record: 03 July 2025
DOI: https://doi.org/10.1038/s41598-025-07838-1

Subjects

Abstract

Similar content being viewed by others

Multi-expert ensemble ECG diagnostic algorithm using mutually exclusive–symbiotic correlation between 254 hierarchical multiple labels

The Harvard-Emory ECG Database

A large-scale multi-label 12-lead electrocardiogram database with standardized diagnostic statements

Introduction

Related work

Graph-based representation and recognition of ECG signals

Graph structure learning

Graph learning with hash-based approaches

Materials and methods

Preliminary and notations

Semantic hash similarity graph learning

Problem formulation

Optimization

Unseen signal hash representation

Dynamic similarity graph

Convergence analysis and complexity discussion

Experiments and discussions

Experiment setting

Analysis of experimental results

Convergency analysis and training time

Ablation analysis

Parameter sensitivity

Noise sensitivity

Conclusion

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Supplementary Information

Supplementary Information. (download ZIP )

Analysis of algorithm 1 convergence

Analysis of algorithm 1 convergence

Theorem 1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links