Abstract
The potential of the sequence in Coronary Angiography (CA) frames for diagnosing coronary artery disease (CAD) has been largely overlooked. Our study aims to reveal the “Sequence Value” embedded within these frames and to explore methods for its application in diagnostics. We conduct a survey via Amazon Mturk (Mechanical Turk) to evaluate the effectiveness of Sequence Restoration Capability in indicating CAD. Furthermore, we develop a self-supervised deep learning model to automatically assess this capability. Additionally, we ensure the robustness of our results by differently selecting coronary angiographies/modules for statistical analysis. Our self-supervised deep learning model achieves an average AUC of 80.1% across five-fold validation, demonstrating robustness against static data noise and efficiency, with calculations completed within 30 s. This study uncovers significant insights into CAD diagnosis through the sequence value in coronary angiography. We successfully illustrate methodologies for harnessing this potential, contributing valuable knowledge to the field.
Similar content being viewed by others
Introduction
Coronary angiography is widely recognized as the gold standard for diagnosing coronary artery disease (CAD), providing dynamic imaging sequences (videos) of contrast-enhanced coronary arteries1. In recent years, artificial intelligence (AI), particularly deep learning, has emerged as a transformative tool for automating the analysis of coronary angiography2. However, most existing studies focus on identifying high-quality static frames for analysis, overlooking the rich temporal dynamics inherent in angiography sequences.
Recent advancements highlight the untapped potential of temporal information in improving diagnostic accuracy. Molenaar et al.3 proposed that leveraging temporal consistency across angiography sequences can enhance stenosis detection performance, increasing F1 scores by 30–40%. Wu et al.4 developed a convolutional neural network (CNN) with temporal constraints, integrating sequential frame analysis to reduce false positives and achieve an 87.2% sensitivity and 79.5% positive predictive value. Similarly, Pang et al.5 proposed Stenosis-DetNet, which employs sequence feature fusion and consistency alignment modules to exploit spatiotemporal patterns, achieving 94.87% precision and 82.22% sensitivity. These studies collectively underscore the diagnostic value of temporal dynamics, enabling robust lesion detection by mitigating transient artifacts and enhancing feature representation through inter-frame correlations.
Despite these advances, a critical gap remains: current methods primarily use temporal information to refine static-frame predictions rather than directly linking sequence-level understanding to disease severity. To address this, we propose a novel diagnostic paradigm that evaluates the ability to restore temporally shuffled angiography sequences. By correlating sequence restoration capability with clinical outcomes, this approach inherently captures the integrity of coronary hemodynamics and structural coherence, offering a holistic framework for assessing CAD severity. This method not only aligns with the growing emphasis on temporal dynamics in AI-driven angiography analysis3,4,5 but also introduces a novel biomarker for personalized CAD management.
Due to the invasive nature of coronary angiography, its application is often tiered. Generally, coronary CTA is used for initial screening, and patients with healthy or mildly diseased conditions do not require further coronary angiography.6 This means that patients undergoing coronary angiography typically have more severe conditions. We use SYNTAX score as the ground truth of our study. SYNTAX score is a scoring system used to evaluate the complexity of coronary artery lesions and is widely applied in clinical decision-making.7,8 This score is composed of multiple factors, including the degree of stenosis in the coronary arteries, the number of branches affected, the location of lesions (such as the trunk, left anterior descending artery, right coronary artery, etc.), and whether there are special conditions such as calcification or distortion.10 We creatively uncover the linkage between Sequence Restoration Capability and SYNTAX Score. Additionally, the automated method linking Sequence Restoration Capability with SYNTAX Score developed in this study is based on a sequence-to-sequence encoding–decoding recurrent neural network model.12 By utilizing this deep learning technology, the study constructs a self-supervised (label-free) model that achieving automated restoration of shuffled frames, outputting restoration accuracy scores, and transforming these into diagnostic criteria (SYNTAX based) in the context of real time Percutaneous Coronary Intervention (PCI) and Coronary Artery Bypass Grafting (CABG) decisions.11 In practice, once coronary angiography is performed, physicians must decide whether to proceed with PCI—a less invasive procedure—or recommend CABG, a more extensive surgical intervention during the angiography process. Currently, this decision is based on a combination of visual interpretation of angiographic images, clinical guidelines, and the SYNTAX score. However, the process can be time-consuming, subjective, and prone to variability depending on the physician’s experience and expertise.11 By automating the assessment of CAD severity through the restoration accuracy of disordered angiography frames, the most immediate benefit of this approach is the ability to assist physicians in making more informed, timely decisions during the angiography process.
The research objective of this study is to explore a novel approach for diagnosing CAD from a completely different perspective. Our method opens a new avenue for automated coronary angiography analysis and provides real-time decision support for physicians in CAD diagnosis.
Methods
Outcome measure
Several key studies, like the SYNTAX trial, have demonstrated that patients with SYNTAX scores ≥ 23 derive significant benefit from CABG over PCI in terms of survival and overall cardiovascular outcomes. Many clinical guidelines suggest CABG for patients with complex disease, defined by higher SYNTAX scores, to optimize treatment efficacy and minimize risks that patients with SYNTAX Score ≥ 23 are suggested to conduct CABG. In this Study, we use the SYNTAX Score > 0 and the SYNTAX Score ≥ 23 as the outcome measures according to “2018 ESC/EACTS Guidelines on myocardial revascularization”.9 For each coronary angiogram segment in the study, the final SYNTAX score is determined by averaging the scores provided by three attending physicians. We verify the correlation between Sequence Restoration Capability and SYNTAX Score in two folds: A Survey & A Self-Supervised Deep Learning Model.
A survey indicating the equivalence between sequence restoration capability and coronary artery disease
This survey is conducted on the Amazon Mechanical Turk (MTurk) platform. Amazon Mechanical Turk (MTurk) is a popular online crowdsourcing platform that facilitates the recruitment of human intelligence for various tasks, often referred to as Human Intelligence Tasks (HITs). Launched in 2005, MTurk connects requesters—typically researchers, businesses, or organizations—with a global workforce of "Turkers," individuals who perform tasks for compensation. Utilizing MTurk enables quick access to a large number of participants, making it ideal for studies. Ethics statement is included in Supplemental Material Sect. 9 to address ethical considerations related to participant recruitment, data privacy, and compliance with ethical guidelines.
In our survey, we recruit 413 participants to ensure a robust dataset. Participants are asked to finish 20 sequence restoration questions (A question example is shown in Fig. 1). Randomly selecting 20 coronary angiographies, we manually pick 5 frames from each selected coronary angiographies and randomly shuffle the 5 frames and construct a new image sequence for each question. We ask the participates to restore the sequence for all questions within 15 min. We use Cohen’s Kappa as the measurement to quantify the extent of consistency between the restored sequence and the actual sequence. We also introduce inter-observer agreement percentage as another measurement to robustly check the final results in Supplemental Table 1. Cohen’s Kappa is a statistical metric that assesses the agreement between two raters or evaluators, correcting for the agreement that could occur by chance. The value of Kappa ranges from -1 to 1, where 1 indicates perfect agreement, 0 indicates no agreement beyond chance, and negative values indicate worse than random agreement. In our case, Cohen Kappa is used to evaluate the consistency between the predicted sequence (restored by MTurk participants) and the actual sequence of frames.
A Survey Question Example. In our survey, participants are presented with a task involving the restoration of the original sequence for 20 questions, each consisting of a video made up of 5 frames. The frames are shuffled. Participants are tasked with restoring the frames to reconstruct the original order, thereby demonstrating their sequence restoration capability. The task aimed to evaluate the effectiveness of sequence restoration as an indicato for diagnosing conditions related to coronary artery disease.
To ensure the robustness of the final results, we take additional steps to handle potential outliers and poor performance. Some participants may provide invalid inputs, such as answering the sequence restoration questions in a monotonically increasing or decreasing order (e.g., 1, 2, 3, 4, 5 or 5, 4, 3, 2, 1), which would be inconsistent with the expected sequence restoration task. To address this, we remove such outliers before further calculations.
Moreover, recognizing that some participants perform poorly and may indicate a lack of capability to accurately complete the task, we have grouped participants based on their performance. The groups are formed as Top10 (the best performed 10 MTurks), Top20, Top30, Top40, Top50, Top100, Top200, Top300, and for each group, we record their Cohen’s Kappa performance. This approach helps to focus on the most competent participants by removing those with the worst performance, ensuring that the results are more reliable and representative of the true ability of the participants to restore the sequence.
A self-supervised deep learning model to restore sequence without labeling
Data collection and preprocessing
From January 1st 2018 to September 30th 2018, we collect 4458 unidentified Coronary Angiographies videos from 353 inpatient patients. We define an “Interferent Set” for videos including some static interferent as illustrated in Supplemental Fig. 2. There are 115 videos in “Interferent Set”. We define a “Useless Set” for videos of low quality as illustrated in Supplemental Fig. 2. We exclude “Useless Set” as they are meaningless. There are 2522 videos in “Useless Set”. But We include “Interferent Set” so as to demonstrate that our model is robust to such static data noise. (We also exclude “Interferent Set” and run the model for robustness check in Table 4.)
The Dense Optical Flow Calculation Process. It is shown that the process of blood vessel visualization under contrast agent in angiography is illustrated using the HSV color spectrum (Supplemental Material Sect. 5). In this representation, the optical flow vectors, corresponding to the motion of the contrast agent, are mapped to the HSV color space. The Hue (H) represents the direction of the flow, while the Magnitude (Mag) is captured by the Value (V) component. As the magnitude of the optical flow increases, the Value (V) becomes brighter, meaning that lighter colors indicate a larger magnitude of flow. This provides a clear and intuitive way to observe the dynamics of blood vessels during the angiography process, with brighter, lighter hues signifying faster or more intense flow, while darker hues correspond to slower or less intense flow.
Table 1 provides a summary of key clinical data from the 353 patients, including the mean and standard deviation (SD) for several variables related to coronary artery disease in the study population. Overall, this table highlights the variability in the clinical characteristics of the study participants, with wide ranges observed for several measures, especially the stenosis percentages.
We get 746 RCA and 1190 LCA videos remaining. To alleviate overfitting, we randomly divide the 353 patients into 5 groups. We alternatively assign 1 group as the test set and the other 4 groups as the training set. fivefold validation is done in turn.
Model development
Based on the novel method and its effectiveness, we design and implement a self-supervised model to learn the capability to recover the shuffled sequence of Coronary Angiography.
Step 1 Disarranging Sequence Order of Coronary Angiography.
A coronary angiography video is composed of many frames and each frame is an image which records development state of the vessels at a certain time when the contrast agent is injected. We shuffle the frames sequence as the input of “Zone (Optical Flow generation)”.
Step 2 We are concerned about the dynamic process of blood vessels during contrast injection. We use optical flow technology to extract sequence features and form an effective zone to calculate on.
Optical flow is the instantaneous motion speed of the pixels of a spatially moving object in the observed imaging plane. Optical flow uses the change of pixels in the image sequence in the time domain between adjacent frames to calculate the motion information of objects between adjacent frames. Optical flow can be generally divided into two categories: sparse optical flow and dense optical flow. Lucas and Kanade25 is a typical method to calculate the sparse optical flow. This method needs to use some method to locate the corner points as the initial points of the optical flow. We use Shi and Tomashi26 method to locate the corner points but find these points are less related to vessels due to various disturbances such as spinal pixels, heartbeat, angle transformation etc. That is, the corner points are not generated on the vessel, so the generated optical flow fails to represent the dynamic effect of the contrast agent injection process. So we choose to compute the dense optical flow proposed by Farneback27. It computes the optical flow of all pixels between frames. It is mainly through polynomial expansion and the assumption of pixel displacement invariance to reach the optimal optical flow.
We then encounter the problem of further selection of effective feature points and form an active zone to enforce the rearranging function on. Because dense optical flow calculates the optical flow of all pixels in a frame, only a small part of which is related to blood vessels. If all the pixels are analyzed: 1). A compensation mechanism would work which put a negative impact on the analysis of the results. Since coronary angiography also records the movement of part of the spine, heartbeat as well as camera angles, it might make it easier to recover the disorder via using such additional information although the vascular visualization part is not easy to be recovered. Bias might be generated. 2). the dimension for Farneback optical flow of each video is up to 262,144 (512*512), which would encounter the curse of dimensionality.
It thus is necessary to develop a screening mechanism to select an “effective zone” from where the optical flows are generated from the pixels of vessel. We screen from the whole dataset of points with optical flow magnitude value larger than 98 percentile and first get a restricted zone of 5243 points. Then we randomly select 5000 points (95%) from the restricted zone as the effective zone concerning the GPU usage limitation. Notice that if there is no limit of GPU memory, the step of “effective zone” generation could be skipped.
Figure 2 is a dense optical flow calculation process, from which we get such a view that the points associated with blood vessels generally change sharply in color and brightness. From the visualization process, different color and brightness express different magnitudes and angles of optical flow.
Figure 3 is a vessel in polar coordinate plane. The final zones and its optical flow traces are illustrated in Fig. 4, column C and column D.
The Polar Coordinate Plane View. To express the optical flow of an angiography video using polar coordinates, we can convert the Cartesian coordinates (typically represented as x and y) into polar coordinates, which are defined by the angle and magnitude. This transformation is particularly useful because the HSV (Hue, Saturation, Value) color model used has a one-to-one mapping with polar coordinates, making the visual representation of optical flow more intuitive. By converting the optical flow vectors into polar coordinates, we can easily visualize the direction and magnitude of the flow through color. For example, a red hue might indicate motion in a certain direction, with the brightness of the red color corresponding to the speed (magnitude) of the flow.
The Effective Zone Generation. The dataset consists of coronary angiography sequences, with the first three rows representing normal coronary angiographies and the last three rows representing abnormal coronary angiographies. The structure of the data is organized into four columns: Column (A): These are the original angiography frames, representing the raw video images of the coronary arteries before any processing or analysis. Column (B): In this column, the Farneback optical flow is applied to the frames in the segment. Column (C): This column shows the selected 5000 optical flow points from the optical flow point set. The optical flow point set contains the motion vectors for each pixel or region of the image. Out of these points, a subset of 5000 points are selected, based on certain criteria (highest magnitude), to focus the analysis on the most relevant or prominent flow patterns. Column (D): This column illustrates the optical flow traces corresponding to the selected 5000 points. These traces represent the trajectories of the selected optical flow points over time, providing a visual representation of the motion patterns within the angiography sequence. The flow traces show how the selected points move from one frame to the next, indicating the direction and magnitude of the motion.
Step 3 Learning.
The objective of the learning model is to learn the capability to recover disordered frames. Once the model is trained to learn the restoration capability, its accuracy is used as an indicator to Sequence Restoration Capability. The model trained to learn restoration capability works in self-supervised learning (a kind of unsupervised learning). In other words, it is not necessary to provide label before training, the model works in an unsupervised learning approach.
As our input is the position sequence of the optical flow from the final effective zone, we use the recurrent neural network (RNN) which is specifically applied to learn and predict the language sequence. Sutskever et al.28 found that using two separate RNN can be good to sequence-to-sequence prediction. In our model, the RNN we use is GRU (Gate Recurrent Unit) invented by Cho et al.29 Such architecture handles variable-length inputs corresponding to different frame-length of every angiography video. The overall Encoder-Decoder GRU architecture is illustrated in Fig. 5.
The Encoder-Decoder GRU Model. The Encoder-Decoder GRU (Gated Recurrent Unit) model is a type of neural network architecture commonly used for sequence-to-sequence tasks. The input sequence, is passed one element at a time through the encoder’s GRU layer (Blue Blocks). The GRU layer computes the hidden states at each time step and updates the memory of the model. These hidden states effectively “encode” the information in the input sequence. After processing the entire input sequence, the final hidden state of the encoder is often used as a context vector (Grey Block), which summarizes the information in the input sequence. This context vector is passed to the decoder. The decoder receives the context vector (or context vectors) from the encoder and generates the output sequence. At each time step, the decoder generates a new output through the decoder’s GRU layer (Yellow Blocks) based on its previous output and the encoder’s context.
The encoder is iterated in image level, each input is the optical flow value of pixels on the image. Through a GRU that generates its output and the hidden layer. The current hidden layer then acts as part of the next input. The decoder works image by image. Given the encoder fixed context vector and the current target, it bypasses a GRU to predict the next target.
The loss of the Model is the MSE between the true vertical coordinate sequence of the optical flow and the predicted. It is used as a measurement of Sequence Restoration Capability. The details of the model are given in Appendix Sect. 6. The self-supervised model is termed Disarranged Zone Learning (DZL). The overall framework of DZL is illustrated in Fig. 6. The hyperparameter settings of the model are list in Supplemental Material Sect. 7.
The Overall Framework for DZL. The overall model can be divided into three parts, including the Disarranged, Zone and Learning. The Disarranged part is functional to disorder the frames of a Coronary Angiography video. We shuffle the video frames in a reversed order. The Zone part is functional to generate an effective zone to capture the sequence information of the video. The Learning part is to learn how to recover the disarranged order accordingly. For DZL, an Encoder Decoder GRU (Gated Recurrent Unit) neural network is designed and applied to act as the core of the self-supervised model.
Ethics statement
This study is approved by Rui Jin Hospital Ethics Committee, NO. 2018–15. All methods are performed in accordance with the relevant guidelines and regulations.
Results
The survey
This study examines the performance of 413 participants. Table 2 summarizes the performance of participants in restoring sequences for both Normal and Abnormal coronary angiographies across different performance groups. The key metrics include performance (Cohen’s Kappa) for each group, the standard error (SE) for both normal and abnormal cases, and the p-value indicating statistical significance. Top 10: The highest-performing group, with a performance of 0.838 for normal angiographies and 0.725 for abnormal angiographies. The p-value (< 0.001) indicates a significant difference in performance between normal and abnormal sequences. Top 20 to Top 50: Performance decreases progressively, with the Top 20 group achieving 0.743 (Normal) and 0.604 (Abnormal), and the Top 50 group achieving 0.495 (Normal) and 0.399 (Abnormal). In all these groups, the p-values remain significant (< 0.001), suggesting a consistent performance difference between normal and abnormal angiographies. Top 100: Performance further drops to 0.339 (Normal) and 0.291 (Abnormal), with the p-value still significant (< 0.001), though the difference between normal and abnormal becomes less pronounced compared to higher-performing groups. Top 200 to Top 300: The performance continues to decrease, with the Top 200 group showing 0.203 (Normal) and 0.196 (Abnormal), and the Top 300 group showing 0.129 (Normal) and 0.139 (Abnormal). In these groups, the p-values suggest no significant difference between normal and abnormal performance: 0.401 for Top 200 and 0.101 for Top 300, indicating the performance difference is no longer statistically significant. Overall, the results show that higher-performing participants (Top 10–Top 100) consistently exhibit significantly better performance for normal angiographies than for abnormal ones, whereas for lower-performing groups (Top 200–Top 300), this difference becomes less pronounced, and statistical significance is lost. Figure 7 is the performance visualization from Top10 MTurks. It is shown that there is a significant gap of the restoration performance between normal and abnormal coronary angiographies.
The Survey and the Results Illustration for Top10 Mturks. The graph displays a comparison of the confusion matrix visualizations for the top 10 performing MTurk participants, categorized into two groups: Normal (left) and Abnormal (right). Each confusion matrix is represented as a heatmap, with the x-axis denoting the predicted sequence and the y-axis showing the actual sequence. The diagonal elements represent correct predictions, where the predicted frame matches the actual frame, indicating successful sequence restoration. For example, if frame 5 is predicted as frame 5, it appears as a dark blue cell along the diagonal. Non-diagonal cells highlight prediction errors, such as when frame 1 is predicted as frame 4. These non-diagonal cells are color-coded to reflect the severity of the mistakes, with darker colors indicating more significant errors. The heatmap reveals that most predictions are correct, as the diagonal cells are predominantly dark blue. However, the non-diagonal cells, which represent errors, show that the Normal group has fewer mistakes compared to the Abnormal group. Specifically, the non-diagonal elements in the Normal group are lighter, suggesting fewer and less errors, whereas the Abnormal group displays darker non-diagonal cells, indicating more frequent and significant prediction mistakes. The difference between the Normal and Abnormal groups can be used to distinguish the two, with the Normal group being easier to predict and exhibiting greater accuracy. This distinction in performance serves as the intuitive basis for our proposed DZL (Disarranged, Zone, Learning) model.
Automation model
PCI/CABG implication
In the context of percutaneous coronary intervention (PCI), the SYNTAX score serves as an important tool for evaluating the complexity of a patient’s coronary artery disease.37,38 In patients planned for simultaneous CAG and PCI, physicians may refer to the SYNTAX score during the angiography to decide whether to proceed with PCI.39 If the SYNTAX score is low (e.g., < 23): This indicates simpler lesions, and the patient is more likely to benefit from PCI, leading the physician to consider proceeding with the intervention. If the SYNTAX score is high (e.g., ≥ 23): This indicates more complex lesions, which may lead to suboptimal outcomes or increased risks with PCI. In such cases, the physician might choose to forgo PCI and recommend alternative treatment options, such as coronary artery bypass grafting (CABG). Our proposed model indicate SYNTAX ≥ 23 which could play a crucial role in the decision-making process for CABG and PCI. By assessing lesion complexity, physicians can make choices about whether to perform PCI during the angiography, thereby providing personalized treatment plans and enhancing therapeutic outcomes for patients.40 The overall performance of our model with hidden size 5000 is shown in Table 3. Group 1 shows an AUC-median of 82.13% (95% CI: 70.31–91.22), with perfect sensitivity (100%) and NPV (100%). However, its specificity is relatively low at 68.75%, and PPV is 74.42%. Group 2 demonstrates the highest specificity (87.5%) and PPV (88.89%), with an AUC-median of 89.16% (95% CI: 76.95–98.44) and perfect sensitivity and NPV. Group 3 shows an AUC-median of 78.89% (95% CI: 67.13–89.45), with sensitivity and NPV both at 100%, but its specificity and PPV are lower, at 64.71% and 73.91%, respectively. Group 4, with an AUC-median of 70.22% (95% CI: 55.33–82.22), has the lowest sensitivity (56.67%) but very high specificity (93.33%) and PPV (89.47%). The NPV for this group is also relatively low at 68.18%. Group 5 shows an AUC-median of 80.02% (95% CI: 67.90–90.05), with sensitivity at 76.47% and specificity at 88.24%. The PPV is 86.21%, and NPV is 78.05%. Overall, the average AUC across all five groups is 80.08%, reflecting a generally good performance across the groups, with variations in sensitivity, specificity, and predictive values.
Efficient
DZL is an Efficient algorithm. For example, existing QFR with reported median analysis time is 5 min in the FAVOR II Europe-Japan study13. However, DZL could be calculated within 30 s. The detailed calculation time distributions are listed in Fig. 8.
Robust
We perform fivefold cross-validation on the dataset with and without the interferent set, as shown in Tables 3 and 4. The results indicate that the inclusion of the interferent set do not reduce the prediction performance, thereby demonstrating the robustness of DZL against static noise interference.
Importantly, flow could/frequently is different in the right Coronary Artery compared with the left (diastolic). However, we find DZL effective in both branches. We add the RCA DZL scores and LCA DZL scores together as the final DZL scores. Furthermore, we also include “Interfered Set” in the training and testing process. Last but not least, we distinguish the RCA videos from 3 angles: a). LAO30°; b). CRA30°; c). RAO30°and we distinguish the LCA videos from 6 angles: a). CAU20°; b) RAO30° + CAU20°; c) LAO45° + CAU20°; d) CRA30°; e) RAO30° + CRA20°; f) LAO45° + CRA20°. The overall angle distributions of RCA and LCA in the 5 groups are illustrated in Fig. 9. Consequently, DZL is robust to different angiography angles as well.
Distributions of Different Angles of 5 Groups. We distinguish the RCA videos from 3 angles: a). CRA30°; b) LAO30°; c) RAO30°; We distinguish the LCA videos from 6 angles: a). CAU20°; b) CRA30°; c) LAO45° + CAU20°; d) LAO45° + CRA20°; e) RAO30° + CAU20°; f) RAO30° + CRA20°. All the videos with these angles are participated in the training and testing process, demonstrating the robustness of DZL in this perspective. (LAO = Left Anterior Oblique. RAO = Right Anterior Oblique. CRA = Cranial.)
Discussion
Key innovations and clinical implications
Prior studies focused on high-quality frame selection, neglecting the value of sequence information in angiographies14, few have explored sequence optimization without a dedicated focus on disordered restoration. No previous research established the relationship between disordered restoration capability and CAD severity, making this study pioneering in this aspect. Due to the expertise and cost associated with medical data annotation, the application of AI, especially deep learning, has been limited; this study’s self-supervised model avoids the need for manual annotation, facilitating better resource allocation and enabling larger datasets to enhance model performance. The study uniquely applies a sequence-to-sequence recurrent neural network model to simulate the disordered restoration capability of coronary angiography.
DZL can automate the analysis of coronary angiography videos, enabling rapid assessment of coronary artery disease severity. As shown in Table 3, the average AUC of DZL is 80.084%. By providing automated feedback on whether a patient is more suitable for Percutaneous Coronary Intervention (PCI) or should be referred for Coronary Artery Bypass Grafting (CABG) during the angiography process, DZL can reduce the time physicians spend analyzing images and making complex decisions. The model can provide decision support by suggesting whether the patient’s coronary artery condition warrants PCI or if a referral for CABG might be more appropriate. This real-time feedback could aid physicians in making more informed, evidence-based decisions, thus improving the decision-making process.31 By offering consistent, data-driven insights, DZL could help reduce the chances of over- or under-treatment. In the context of PCI versus CABG, accurate decision-making is critical for optimal patient outcomes. The use of DZL could therefore reduce procedural risks and improve recovery outcomes for patients. DZL can standardize the evaluation of angiography images, reducing inter-physician variability in diagnosis and treatment planning. This consistency can lead to more reliable and predictable outcomes for patients.
Theoretical background
Coronary Angiography (CAG) is exhibited in a form of videos, which record the dynamic process of contrast agent flowing in the blood vessel. The part that is unable to be developed by contrast agent is the location of suspected lesions. The opacification process, in a human visual perspective, is a temporal and spatial integrative movement. The antegrade contrast material moves forward within a specified time window. The spatial movement alone is not sufficient to represent since the antegrade contrast material should enter or clear(or both) with predefined speed.15,16,17 We use two branches of theoretical arguments in terms of how human brains process temporal and spatial information: CMT (Conceptual Metaphor Theory) v.s. ATOM (A Theory of Magnitudes). (Detail introductions of the two theory and their suitableness to be referred to are given in Supplemental Material Sect. 3) Building on both theories, we claim the availability that time can be represented by space. If we shuffle the time factor, it will thus rearrange the antegrade contrast agent spatially. The disordered time will cause spatial incoherency of smooth opacification process and such incoherency, if reaches some extent, is possible to be observed by humans visually. On the contrary, if the shuffled time brings less or no obvious spatial incoherency (opacification process is less smooth or even blocked), humans are not able to restore the sequence. Hence the Sequence Restoration Capability of the shuffled frames is an indicator to the state of coronary artery disease. Consequently, we make link between Sequence Restoration Capability and Opacification Smoothness, providing our hypothesis foundation to link Sequence Restoration Capability with Coronary Artery Disease.
Label free
There are many mature annotation libraries in image recognition, the most famous one is ImageNet18, and there are some other annotation libraries in video recognition such as Kinetics. These databases covered general scenarios. However, in the field of medical imaging, especially Coronary Angiography, there is no mature annotation library developed. For studies using deep learning techniques in this field, researchers need to complete the label engineering by themselves which is costly since the annotation must be finished by professional doctors.19,20,21,22,23,24 This makes the development of deep learning, especially supervised learning in medical image field greatly limited.
The label-free learning approach provides significant cost-saving benefits compared to traditional methods that rely on manually labeled data.32 Quantitative estimates show that, on average, label-free learning can reduce data labeling time, leading to savings of approximately $30 per angiography. The detailed annotation costs are estimated in Supplemental Material Sect. 4. Additionally, the scalability of label-free methods enables more efficient processing of large datasets, reducing training time by more than 10 times compared to supervised models33, which experience linear increases in cost as dataset size grows.
We would like to clarify that DZL is a self-supervised learning model, which falls under unsupervised learning. The training of the model does not require manual annotations. In our application, the training labels are whether each angiography frame is shuffled or not. The learned model is then used to output the DZL score for each angiography segment, which is compared with the ground truth to assess the model’s AUC performance.
Potential of DZL
We infer that DZL still has great potential to be improved since the model use optical flow technology and an effective zone is generated to represent sequence feature between frames which is an idea of dimension reduction. The dimension of the model input is reduced by more than 50 times (We randomly select 5000 points from the “Restricted Zone (5243 points)” from the 512*512 whole set (Hidden Size = 5000)) while the performance of the model is maintained in a good level. Dimension reduction brings about the loss of information and abstraction of features, causing the AUC deteriorated. If we enhance the parallel computing power of computers, such as using GPU with larger memory or more quantity (Current GPU Setting: Two NVIDIA GeForce RTX 3090-32G) to design and implement the model so as not to reduce the dimension too much, it could thus improve the AUC. To demonstrate such argument, we attach the final results (Table 5) of the model only randomly selecting 1500 points from the “Restricted Zone” from the 512*512 whole set (Hidden Size = 1500), the results deteriorate a lot as expected. Such counter-operation demonstrate on the other hand that the quantities of points selected are positively related with the final performance.
The comparison with traditional supervised and unsupervised models
We compare with the performance metrics of diagnostic models that leverage temporal sequence information. Our method achieves a sensitivity of 87.89%, which is comparable to Wu et al. (87.2%)4 and slightly higher than Pang et al. (82.22%)5, demonstrating strong performance in identifying true positives across diverse subgroups. Additionally, our PPV of 82.58% is slightly higher than Wu et al. (79.5%) but lower than Pang et al. (94.87%). Furthermore, our AUC of 80.08% reflects a strong overall discriminative ability. With comparable results, while these models heavily rely on supervised labels, our model is entirely unsupervised and requires no labeled data.
According to the review from Wang et al.30, although AUCs of 44.83% studies within their research scope are reported higher than 90%, as we take a detailed look, all of their models are supervised based. In contrast, our DZL model is self-supervised, meaning it can learn sequence restoration capability from unlabeled data.
Unsupervised approaches like this have been gaining traction in medical imaging, particularly for tasks where labeled data is scarce or unavailable. For instance, methods like the Generative Adversarial Networks (GANs)41 have been employed for unsupervised image restoration in medical imaging, though these typically focus on image quality rather than sequence restoration. The novelty of our approach lies in its ability to learn the sequence restoration capability directly from the disordered frames in coronary angiography, which sets it apart from both supervised CNNs and LSTMs, as well as existing unsupervised models. Our model can assess CAD severity by evaluating the accuracy of sequence restoration, making it particularly useful in clinical settings where annotation is impractical.
Alternative model core: LSTM
To demonstrate the robustness of our model, we modify our model to replace GRU with LSTM as the core component of the encoder-decoder sequential model. The detailed reasoning for such replacement is given in Supplemental Material Sect. 8. The new model is trained and tested, and we present the results of this updated model in Supplemental Table 2.
This adjustment allows us to explore whether the LSTM, another popular architecture for sequence modeling, provides any improvement or additional insights into the restoration capabilities and their correlation with CAD diagnosis. Based on the comparison shown in Table 3 and Supplemental Table 2, it can be observed that the performance of the results obtained using GRU is comparable to that of the results obtained using LSTM. This demonstrates the effectiveness of using GRU as the core of our model and supports the inherent logical structure of DZL for angiography diagnosis.
Future work
While the current DZL model primarily relies on the SYNTAX score to assess anatomical severity in coronary artery disease (CAD), additional diagnostic measures that focus on the functional assessment of CAD, such as Fractional Flow Reserve (FFR), could further enhance its clinical applicability. In particular, the integration of FFR data obtained from coronary angiography presents a promising avenue for future research. With recent advancements, algorithms like Quantitative Flow Ratio (QFR) can now perform functional analysis based on coronary angiography in approximately 5 min. However, if DZL is demonstrated to work effectively in a functional context, it could offer a significant advantage, with processing times as short as 30 s. This real-time capability could be particularly valuable for guiding clinical decision-making during interventional procedures, providing clinicians with immediate insights for more informed, timely interventions.
Limitations
The manuscript describes data collection from a limited timeframe (January–September 2018), which introduces potential “data bias” due to the restricted scope of the dataset. This limitation may result in an incomplete representation of the overall population or phenomena under study, as the data might not adequately capture the full diversity of patterns, behaviors, or subgroups that exist beyond this specific period. To address this bias, future efforts could expand the dataset to include a more comprehensive and representative range of data, ensuring that the model is robust and applicable to a wider variety of scenarios.
Next, the critical limitation of DZL is that it is dysfunctional to distinguish further between LAD and LCX performance separately. DZL is working on video level. LCA and RCA are physically separated into different videos. However, there is no physical video specifically recording LAD or LCX alone. Such video structure limits the separated calculation of LAD and LCX solely. Besides, DZL is not able to locate the exact position of the lesions, only the overall DZL scores are available for RCA and LCA respectively. While combining DZL with transfer learning offers an effective approach to differentiate between LAD and LCX in coronary angiography and attain lesion morphology and vessel characteristics. Transfer learning leverages pre-trained models like ResNet34 or EfficientNet35, which could be fine-tuned on a specific dataset of LAD and LCX lesions.36 This enables the model to learn critical artery-specific features, such as lesion morphology and vessel characteristics, improving its ability to distinguish between the two arteries and localize lesion of vessels.
Once fine-tuned, the model’s extracted features are input into DZL for decision-making. This two-stage process enhances diagnostic accuracy and reduces physician workload by automating lesion detection and clinical decision support. Future improvements could include integrating multi-modal data and temporal modeling, further refining decision-making for more precise and timely interventions.
Notably, DZL lacks the capability to localize lesions within the coronary vasculature, which is critical for targeting precise areas during interventions. Additionally, DZL’s ability to extract and analyze lesion-specific features, such as plaque characteristics and the degree of stenosis, remains limited. Expanding DZL’s functionality to include these features could significantly enhance its diagnostic and therapeutic potential, providing a more detailed and personalized approach to CAD management.
Last, typical image-level preprocessing techniques of Coronary Angiography are not applied in this work as we want to test the pure power of DZL. Future work could gradually integrate the related procedures such as segmentations, centerline extractions etc. into the model.
Conclusions
This study presents an innovative approach that links the accuracy of restoring disordered coronary angiography frames with the diagnosis of coronary artery disease (CAD). We demonstrate that higher accuracy in restoring disordered frames correlates with a lower probability and severity of CAD.
The proposed self-supervised deep learning model consists of three key modules: Disarranged, Zone, and Learning. The Disarranged module randomizes the order of angiography frames, the Zone module extracts features using optical flow technology, and the Learning module utilizes a sequence-to-sequence encoding–decoding recurrent neural network for automated frame restoration. This modular approach enables the model to automatically diagnose CAD based on restoration accuracy.
The findings highlight the potential of combining sequence restoration accuracy with deep learning to assist in diagnosing CAD. Future advancements may include further optimization of the model and the incorporation of additional data sources to improve diagnostic precision.
Data availability
The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.
Abbreviations
- Amazon Mturk:
-
Amazon mechanical turk
- ATOM:
-
A theory of magnitudes
- AUC:
-
Area under the curve
- CAD:
-
Coronary artery disease
- CAG:
-
Coronary angiography
- CABG:
-
Coronary artery bypass grafting
- CAU:
-
Caudal
- CCTA:
-
Coronary computed tomography angiography
- CMR:
-
Cardiac magnetic resonance
- CMT:
-
Conceptual metaphor theory
- CRA:
-
Cranial
- CT:
-
Computed tomography
- DZL:
-
Disarranged zone learning
- FFR:
-
Fractional flow reserve
- GPU:
-
Graphic processing unit
- GRU:
-
Gated recurrent unit
- LAD:
-
Left anterior descending artery
- LAO:
-
Left anterior oblique
- LCA:
-
Left coronary artery
- LSTM:
-
Long short term memory
- LV:
-
Left ventricle
- PCI:
-
Percutaneous coronary intervention
- QFR:
-
Quantitative flow ratio
- RAO:
-
Right anterior oblique
- RCA:
-
Right coronary artery
- ROC:
-
Receiver operating characteristic
- RNN:
-
Recurrent neural network
- TIMI:
-
Thrombolysis in myocardial infarction
References
Tavakol, M., Ashraf, S. & Brener, S. J. Risks and complications of coronary angiography: A comprehensive review. Global J. Health Sci. 4(1), 65 (2012).
Becker, C. Advances in cardiac imaging. Eur. Radiol. 13, N50–N52. https://doi.org/10.1007/s00330-003-0007-z (2003).
Molenaar, M. A. et al. Current state and future perspectives of artificial intelligence for automated coronary angiography imaging analysis in patients with ischemic heart disease. Curr. Cardiol. Rep. 24(4), 365–376. https://doi.org/10.1007/s11886-022-01655-y (2022).
Wu, W. et al. Automatic detection of coronary artery stenosis by convolutional neural network with temporal constraint. Comput. Biol. Med. 118, 103657 (2020).
Pang, K. et al. Stenosis-DetNet: Sequence consistency-based stenosis detection for X-ray coronary angiography. Comput. Med. Imaging Graph. 89, 101900 (2021).
Nieman, K. et al. Coronary angiography with multi-slice computed tomography. The Lancet 357(9256), 599–603 (2001).
Sianos, G. et al. The SYNTAX Score: an angiographic tool grading the complexity of coronary artery disease. EuroIntervention 1(2), 219–227 (2005).
Head, S. J. et al. The SYNTAX score and its clinical implications. Heart 100(2), 169–177 (2014).
Franz-Josef Neumann and others. 2018 ESC/EACTS Guidelines on myocardial revascularization. Eur. Heart J. 40(2), 87–165. https://doi.org/10.1093/eurheartj/ehy394 (2019).
Farooq, V. et al. The CABG SYNTAX Score-an angiographic tool to grade the complexity of coronary disease following coronary artery bypass graft surgery: from the SYNTAX Left Main Angiographic (SYNTAX-LE MANS) substudy. EuroIntervention 8(11), 1277–1285 (2013).
Palmerini, T. et al. Prognostic value of the SYNTAX score in patients with acute coronary syndromes undergoing percutaneous coronary intervention: Analysis from the ACUITY (Acute Catheterization and Urgent Intervention Triage StrategY) trial. J. Am. Coll. Cardiol. 57(24), 2389–2397 (2011).
Park, S. H., Kim, B. D., Kang, C. M. et al. Sequence-to-sequence prediction of vehicle trajectory via LSTM encoder-decoder architecture. In 2018 IEEE intelligent vehicles symposium (IV). IEEE, 2018: 1672–1678.
Song, L. et al. Quantitative flow ratio-guided strategy versus angiography-guided strategy for percutaneous coronary intervention: Rationale and design of the FAVOR III China trial. Am. Heart J. 223, 72–80. https://doi.org/10.1016/j.ahj.2020.02.015 (2020).
Miller, J. M. et al. Diagnostic Performance of Coronary Angiography by 64-Row CT. N. Engl. J. Med. 359, 2324–2336. https://doi.org/10.1056/NEJMoa0806576 (2008).
The TIMI Study Group. The thrombolysis in myocardial infarction (TIMI) trial phase I findings. N. Engl. J. Med. 312, 932–936 (1985).
Kern, M. J. et al. Determination of angiographic (TIMI grade) blood flow by intracoronary doppler flow velocity during acute myocardial infarction. Circulation 94, 7. https://doi.org/10.1161/01.CIR.94.7.1545 (1996).
Xu, B. et al. Angiographic quantitative flow ratio-guided coronary intervention (FAVOR III China): a multicenter, randomized, sham-controlled trial. Lancet 398, 2149–2159 (2021).
Jia, D., Wei, D., Richard, S., Li-Jia, L., Kai, L., Li, F. F. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. June 20th – 25th, 2009. Miami, FL, USA. 248–375 255.
Zreik, M. et al. Deep learning analysis of coronary arteries in cardiac ct angiography for detection of patients requiring invasive coronary angiography. IEEE Trans. Med. Imaging. 39, 1545–1557. https://doi.org/10.1109/TMI.2019.2953054 (2019).
Fischer, A. M. et al. Accuracy of an artificial intelligence deep learning methodology implementing a recurrent neural network with long short term memory for the automated detection of calcified plaques from coronary computed tomography angiography. J. Thorac. Imaging. 35, S49–S57. https://doi.org/10.1097/RTI.0000000000000491 (2020).
Shadmi, R., Mazo, V., Bregman-Amitai, O., Elnekave, E.. 2018. Fully-convolutional deep-learning based system for coronary calcium score prediction from non-contrast chest ct. In 2018 IEEE 15th International Symposium on Biomedical Imaging, 24–28.
Zreik, M. et al. A recurrent CNN for automatic detection and classification of coronary artery plaque and stenosis in coronary CT angiography. IEEE Trans. Med. Imaging. 38, 1588–1598. https://doi.org/10.1109/TMI.2018.2883807 (2018).
Mark, D. B. et al. Continuing evolution of therapy for coronary artery disease: initial results from the era of coronary angioplasty. Circulation 89, 2015–2025 (1994).
Milea, D. et al. Artificial intelligence to detect papilledema from ocular fundus photographs. N. Engl. J. Med. 382, 1687–1695. https://doi.org/10.1056/NEJMoa1917130 (2020).
Lucas, B. D., Kanade, T. An iterative image registration technique with an application to stereo vision. In Proceedings of DARPA Image Understanding Workshop. 20th-30th April, 1981. Minneapolis; 121–130. https://doi.org/10.1042/cs0730285
Shi, J., Tomasi, C. Good features to track. In 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 1994; 593–600.
Farneback, G. Two-frame motion estimation based on polynomial expansion. Scandinavian Conference on Image Analysis. Halmstad, Sweden, 29th June to 2nd July. 2003; 363–370. https://doi.org/10.1007/3-540-45103-X
Sutskever, I., Vinyals, O., Le, Q. V. Sequence to sequence learning with neural networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems. 2014; 2, 3104–3112.
Cho, K., Bahdanau, D., Bougares, F., Holger, S., Bengio, Y. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. 2014. EMNLP. https://doi.org/10.3115/v1/D14-1179
Wang, X. et al. Using artificial intelligence in the development of diagnostic models of coronary artery disease with imaging markers: A scoping review. Front. Cardiovasc. Med. https://doi.org/10.3389/fcvm.2022.945451 (2022).
Alizadehsani, R. et al. Coronary artery disease detection using artificial intelligence techniques: A survey of trends, geographical differences and diagnostic features 1991–2020. Comput. Biol. Med. 128, 104095. https://doi.org/10.1016/j.compbiomed.2020.104095 (2020).
Mangla, A., Oliveros, E., Williams, K. & Kalra, D. Cardiac imaging in the diagnosis of coronary artery disease. Curr. Probl. Cardiol. 42(10), 316–366. https://doi.org/10.1016/j.cpcardiol.2017.04.005 (2017).
Jaarsma, C. et al. Diagnostic performance of noninvasive myocardial perfusion imaging using single-photon emission computed tomography, cardiac magnetic resonance, and positron emission tomography imaging for the detection of obstructive coronary artery disease: a meta-analysis. J. Am. Coll. Cardiol. 59(19), 1719–1728. https://doi.org/10.1016/j.jacc.2011.12.040 (2012).
Targ S, Almeida D, Lyman K. Resnet in resnet: Generalizing residual architectures. arXiv preprint arXiv:1603.08029, 2016.
Koonce, B. & Koonce, B. EfficientNet Convolutional neural networks with swift for Tensorflow: Image recognition and dataset categorization. Life 13(4), 109–1235 (2021).
Ribeiro, D. O. L. et al. Evolving Diagnostic and Management Advances in Coronary Heart Disease. Life https://doi.org/10.3390/life13040951 (2023).
Mite, A., Mosquera-Ceprian, R., Franco, C., Pacheco, G. & Torres, J. Recent advances in interventional cardiology for coronary artery disease: a systematic literature review. Sapienza Int. J. Interdiscip. Stud. https://doi.org/10.51798/sijis.v5isi1.767 (2024).
De Oliveira Laterza Ribeiro, M., Correia, V., De Oliveira, L., Soares, P., & Scudeler, T. (2023). Evolving Diagnostic and Management Advances in Coronary Heart Disease. Life. https://doi.org/10.3390/life13040951
Dodd, J., & Leipsic, J. (2023). Evolving Developments in Cardiac CT. Radiology, 222827 . https://doi.org/10.1148/radiol.222827.
Zhang, Y. et al. Fully automated coronary artery segmentation using deep learning. J. Digit. Imaging 31(5), 711–718 (2018).
Goodfellow, I. et al. Generative adversarial nets. Adv. Neural Inf. Process. Systems (NeurIPS) 27, 2672–2680 (2014).
Funding
The National Natural Science Foundation of China (Grant No. 82200554). Young Medical Talents Training Program of Pudong Health Bureau of Shanghai (Grant No. PWRq2023-16).
Author information
Authors and Affiliations
Contributions
1. Literature Search: Bangde Xue, Yun Ling 2. Figures: Yanan Dai, Pengxiong Zhu 3. Study Design: Yanan Dai, Pengxiong Zhu 4. Data Collection: Xibao Shi 5. Software Implementation: Yanan Dai, Pengxiong Zhu, Yunhao Xie 5. Data Analysis: Yanan Dai, Pengxiong Zhu 6. Data Interpretation: Liang Geng 7. Writing: Yanan Dai, Pengxiong Zhu 8. Supervision: Jun Liu, Qi Zhang, Jian-Qiang Hu".
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethical approval
The institutional ethics board approved the study of Rui Jin Hospital (NO. 2018–15), and informed consent is taken from all the patients.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Dai, Y., Zhu, P., Xie, Y. et al. Linking sequence restoration capability of shuffled coronary angiography to coronary artery disease diagnosis. Sci Rep 15, 11413 (2025). https://doi.org/10.1038/s41598-025-95640-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-95640-4