Abstract
Cardiovascular diseases (CVDs) are the leading cause of death worldwide. To interpret disease mechanisms and warn CVDs in early life, biobanks have emerged to collect genotype and electrocardiogram (ECG) data. However, only 10% of samples contain both genotype data and ECG data in UK-Biobank (UKB), limiting the utility of the biobanks. Here, we have developed an attention-based Capsule Network (CapECG), to predict ECG traits from genotype. CapECG has mapped high dimensional genotype to low dimensional ECG traits and improved the CVDs prediction from genotype. CapECG achieved an average Pearson correlation coefficient (PCC) of 0.62 for 7422 individuals in the internal test set from UKB. The model was used to predict 169 ECG traits for 388,284 individuals containing only genotype data in UKB. The predicted 169 ECG traits were used to assess risks of six types of CVDs, and achieved average area under the curve (AUC) of 0.80, higher than 0.71 provided by the polygenic risk score-based method. Genome-wide association study (GWAS) on the predicted spatial QRS-T angle (spQRSTa) identified 133 significant single nucleotide polymorphisms (SNPs), including 33 overlapping with a published GWAS on 118,780 individuals, surpassing 13 overlaps from observed spQRSTa of 29,692 individuals. Thus, this study proposed a new way to predict ECG traits from genotype and bridge the early prediction of diseases.
Similar content being viewed by others
Data availability
The ECG data are available by request from UK Biobank, see https://www.ukbiobank.ac.uk/enable-your-research/ apply-for-access. The predicted ECG traits are used for predicting CVDs and GWAS. The GWAS summary statistics for the 11 predicted ECG traits, including effect sizes (beta) and standard errors (SE) for all SNP associations, are publicly available at Zenodo: https://zenodo.org/records/13859786.
Code availability
Our framework was implemented by Python 3.9.16 and the Pytorch 1.12.1 library with torch-geometric 2.3.1. The operating system version is Ubuntu 22.04.2. More details can be found in the Methods section, supplementary information and the code repository (https://github.com/biomed-AI/CapECG). The Mendelian randomization (MR) analyses were performed by R packages TwoSampleMR (version 0.5.5) and GSMR (version 1.0.9). The MAGMA analysis was performed by the third party code at https://ctg.cncr.nl/software/magma. The spatial QRS-T angle was measured by the methods published in Nature Communications 202338 and Biomed Signal Process Control 202164. The code is available at https://data.mendeley.com/datasets/3wb7sztrrb/1. Other ECG traits were measured by the methods published in Human Genetics 202466. The codes are available at https://github.com/Qimengling/ecg_traits_extraction.
References
Marijon, E., Garcia, R., Narayanan, K., Karam, N. & Jouven, X. Fighting against sudden cardiac death: need for a paradigm shift-Adding near-term prevention and pre-emptive action to long-term prevention. Eur. Heart J. 43, 1457–1464 (2022).
Miyazawa, K. et al. Cross-ancestry genome-wide analysis of atrial fibrillation unveils disease biology and enables cardioembolic risk prediction. Nat. Genet. 55, 187–197 (2023).
Yuan, S. et al. Deciphering the genetic architecture of atrial fibrillation offers insights into disease prediction, pathophysiology and downstream sequelae. medRxiv, https://doi.org/10.1101/2023.07.20.23292938 (2023).
Tcheandjieu, C. et al. Large-scale genome-wide association study of coronary artery disease in genetically diverse populations. Nat. Med. 28, 1679–1692 (2022).
Chatterjee, N., Shi, J. & García-Closas, M. Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat. Rev. Genet. 17, 392–406 (2016).
AlGhatrif, M. & Lindsay, J. A brief review: history to understand fundamentals of electrocardiography. J. Commun. Hosp. Intern. Med. Perspect. 2, 14383 (2012).
Arking, D. E. et al. Genetic association study of QT interval highlights role for calcium signaling pathways in myocardial repolarization. Nat. Genet. 46, 826–836 (2014).
Young, W. J. et al. Genetic analyses of the electrocardiographic QT interval and its components identify additional loci and pathways. Nat. Commun. 13, 5144 (2022).
Van Setten, J. et al. PR interval genome-wide association meta-analysis identifies 50 loci associated with atrial and atrioventricular electrical activity. Nat. Commun. 9, 2904 (2018).
Ntalla, I. et al. Multi-ancestry GWAS of the electrocardiographic PR interval identifies 202 loci underlying cardiac conduction. Nat. Commun. 11, 2542 (2020).
Prins, B. P. et al. Exome-chip meta-analysis identifies novel loci associated with cardiac conduction, including ADAMTS6. Genome Biol. 19, 1–17 (2018).
Crotti, L. et al. NOS1AP is a genetic modifier of the long-QT syndrome. Circulation 120, 1657–1663 (2009).
Kolder, I. C. et al. Analysis for genetic modifiers of disease severity in patients with long-QT syndrome type 2. Circul. Cardiovasc. Genet. 8, 447–456 (2015).
Nauffal, V. et al. Monogenic and polygenic contributions to QTc prolongation in the population. Circulation 145, 1524–1533 (2022).
Verweij, N. et al. The genetic makeup of the electrocardiogram. Cell Syst. 11, 229–238.e225 (2020).
Glinge, C., Lahrouchi, N., Jabbari, R., Tfelt-Hansen, J. & Bezzina, C. R. Genome-wide association studies of cardiac electrical phenotypes. Cardiovasc. Res. 116, 1620–1634 (2020).
Nolte, I. M. et al. A comparison of heritability estimates by classical twin modeling and based on genome-wide genetic relatedness for cardiac conduction traits. Twin Res. Hum. Genet. 20, 489–498 (2017).
Jamshidi, Y., Nolte, I. M., Spector, T. D. & Snieder, H. Novel genes for QTc interval. How much heritability is explained, and how much is left to find? Genome Med. 2, 35 (2010).
Russell, M. W., Law, I., Sholinsky, P. & Fabsitz, R. R. Heritability of ECG measurements in adult male twins. J. Electrocardiol. 30, 64–68 (1998).
Ornella, L. et al. Genomic-enabled prediction with classification algorithms. Heredity 112, 616–626 (2014).
Liang, Y., Melia, O., Brettin, T., Brown, A. & Im, H. K. BrainXcan identifies brain features associated with behavioral and psychiatric traits using large scale genetic and imaging data. medRxiv, 2021.2006. 2001.21258159 (2021).
Mackay, T. F. Epistasis and quantitative traits: using model organisms to study gene–gene interactions. Nat. Rev. Genet. 15, 22–33 (2014).
Prabhu, S. & Pe’er, I. Ultrafast genome-wide scan for SNP–SNP interactions in common complex disease. Genome Res. 22, 2230–2240 (2012).
Luo, X., Kang, X. & Schönhuth, A. Predicting the prevalence of complex genetic diseases from individual genotype profiles using capsule networks. Nat. Mach. Intell. 5, 114–125 (2023).
van Hilten, A. et al. GenNet framework: interpretable deep learning for predicting phenotypes from genetic data. Commun. Biol. 4, 1094 (2021).
Slatkin, M. Linkage disequilibrium—understanding the evolutionary past and mapping the medical future. Nat. Rev. Genet. 9, 477–485 (2008).
Hinton, G. E., Sabour, S. & Frosst, N. Matrix capsules with EM routing. International conference on learning representations, 2018.
Camacho, D. M., Collins, K. M., Powers, R. K., Costello, J. C. & Collins, J. J. Next-generation machine learning for biological networks. Cell 173, 1581–1592 (2018).
Wang, L. et al. An interpretable deep-learning architecture of capsule networks for identifying cell-type gene expression programs from single-cell RNA-sequencing data. Nat. Mach. Intell. 2, 693–703 (2020).
Zhang, Z. et al. CapsNet-LDA: predicting lncRNA-disease associations using attention mechanism and capsule network based on multi-view data. Brief. Bioinforma. 24, bbac531 (2023).
Sabour, S., Frosst, N. & Hinton, G. E. Dynamic routing between capsules. Adv. Neural Inform. Process. Syst. 30 (2017).
Bland, J. M. & Altman, D. G. Measuring agreement in method comparison studies. Stat. methods Med. Res. 8, 135–160 (1999).
Vaswani, A. et al. Attention is all you need. Adv. Neural Inform. Process. Syst. 30, 5998–6008 (2017).
Wang, W. & Li, B. A novel model based on a 1D-ResCNN and transfer learning for processing EEG attenuation. Comput. Methods Biomech. Biomed. Eng. 26, 1980–1993 (2023).
Chen, T. & Guestrin, C. in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 785–794.
Sakaue, S. et al. Dimensionality reduction reveals fine-scale structure in the Japanese population with consequences for polygenic risk prediction. Nat. Commun. 11, 1569 (2020).
Ni, G. et al. Estimation of genetic correlation via linkage disequilibrium score regression and genomic restricted maximum likelihood. Am. J. Hum. Genet. 102, 1185–1194 (2018).
Young, W. J. et al. Genetic architecture of spatial electrical biomarkers for cardiac arrhythmia and relationship with cardiovascular disease. Nat. Commun. 14, 1411 (2023).
Wharrie, S. et al. HAPNEST: efficient, large-scale generation and evaluation of synthetic datasets for genotypes and phenotypes. bioinformatics 39, btad535 (2023).
Consortium, G. P. A map of human genome variation from population scale sequencing. Nature 467, 1061 (2010).
Hoffmann, T. J. et al. A large genome-wide association study of QT interval length utilizing electronic health records. Genetics 222, https://doi.org/10.1093/genetics/iyac157 (2022).
De Leeuw, C. A., Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219 (2015).
Edwards, J. J. et al. Impact of fetal hypoxia on myocardial molecular profile in the lamb extra-uterine environment for neonatal development. Circulation 146, A11646–A11646 (2022).
Hartiala, J. A. et al. Genome-wide analysis identifies novel susceptibility loci for myocardial infarction. Eur. Heart J. 42, 919–933 (2021).
Reimand, J. et al. g: Profiler—a web server for functional interpretation of gene lists (2016 update). Nucleic Acids Res. 44, W83–W89 (2016).
van der Maarel, L. E., Postma, A. V. & Christoffels, V. M. Genetics of sinoatrial node function and heart rate disorders. Dis. Models Mech. 16, dmm050101 (2023).
Cheung, C. Y. et al. A deep-learning system for the assessment of cardiovascular disease risk via the measurement of retinal-vessel calibre. Nat. Biomed. Eng. 5, 498–508 (2021).
Vitale, G. et al. Standard ECG for differential diagnosis between Anderson-Fabry disease and hypertrophic cardiomyopathy. Heart 108, 54–60 (2022).
Koechlin, L. et al. Hyperacute T wave in the early diagnosis of acute myocardial infarction. Ann. Emerg. Med 82, 194–202 (2023).
Sau, A. et al. Artificial intelligence-enabled electrocardiogram for mortality and cardiovascular risk estimation: a model development and validation study. Lancet Digit Health 6, e791–e802 (2024).
Jensen, K. et al. Bringing critical race praxis into the study of electrophysiological substrate of sudden cardiac death: the ARIC study. J. Am. Heart Assoc. 9, e015012 (2020).
Zhang, X. et al. Spatial/frontal QRS-T angle predicts all-cause mortality and cardiac mortality: a meta-analysis. PloS One 10, e0136174 (2015).
Tanigawa, Y. et al. Significant sparse polygenic risk scores across 813 traits in UK Biobank. PLoS Genet 18, e1010105 (2022).
An, U. et al. Deep learning-based phenotype imputation on population-scale biobank data increases genetic discoveries. Nat. Genet 55, 2269–2276 (2023).
Calus, M. P. & Vandenplas, J. SNPrune: an efficient algorithm to prune large SNP array and sequence datasets based on high linkage disequilibrium. Genet. Select. Evol. 50, 1–11 (2018).
Hibar, D. P. et al. Voxelwise gene-wide association study (vGeneWAS): multivariate gene-based association testing in 731 elderly subjects. Neuroimage 56, 1875–1891 (2011).
He, D. et al. Accurate identification of genes associated with brain disorders by integrating heterogeneous genomic data into a Bayesian framework. EBioMedicine 107, 105286 (2024).
Lopes, L. R. et al. Alpha-protein kinase 3 (ALPK3) truncating variants are a cause of autosomal dominant hypertrophic cardiomyopathy. Eur. Heart J. 42, 3063–3073 (2021).
Kan-o, M. et al. Mammalian formin Fhod3 plays an essential role in cardiogenesis by organizing myofibrillogenesis. Biol. Open 1, 889–896 (2012).
Fujimoto, N. et al. Transgenic expression of the formin protein fhod3 selectively in the embryonic heart: role of actin-binding activity of fhod3 and its sarcomeric localization during myofibrillogenesis. PLoS One 11, e0148472 (2016).
Li, C. et al. Genomic innovation in early life cardiovascular disease prevention and treatment. Circ. Res 132, 1628–1647 (2023).
Alanis-Lobato, G., Cannistraci, C. V., Eriksson, A., Manica, A. & Ravasi, T. Highlighting nonlinear patterns in population genetics datasets. Sci. Rep. 5, 8140 (2015).
Taş, G. et al. Computing linkage disequilibrium aware genome embeddings using autoencoders. Bioinformatics btae, 326 (2024).
Young, W. J. et al. A method to minimise the impact of ECG marker inaccuracies on the spatial QRS-T angle: evaluation on 1,512 manually annotated ECGs. Biomed. Signal Process. Control 64, 102305 (2021).
Wang, X., Qi, M., Zhang, H., Yang, Y. & Zhao, H. Genome-wide association and Mendelian randomization analysis provide insights into the shared genetic architecture between high-dimensional electrocardiographic features and ischemic heart disease. Hum. Genet 143, 49–58 (2024).
Qi, M. et al. Genetic evidence for T-wave area from 12-lead electrocardiograms to monitor cardiovascular diseases in patients taking diabetes medications. Human Genetics 143, 1095–1108 (2024).
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
Loh, P.-R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1443–1448 (2016).
International HapMap, C. et al. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).
Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).
Kinga, D. & Adam, J. B. A method for stochastic optimization. International conference on learning representations, Vol. 5, (2015).
He, K., Zhang, X., Ren, S. & Sun, J. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
Genomes Project, C. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Acknowledgements
The work was funded by the National Key Research and Development Program of China (2023YFF1204902), the Natural Science Foundation of China (82371482), Guangdong Basic and Applied Basic Research Foundation (2024B1515040001), Guangzhou Science and Technology Research Plan (2023A03J0659), Natural Science Foundation of Guangdong (2024A1515011363) and Guangzhou Key Laboratory of Artificial Intelligence (2024A03J0847) for data analysis and study design.
Author information
Authors and Affiliations
Contributions
H.Z. initialized the idea. All authors refined the experimental setup. S.L. designed the algorithm and built the model. S.L. collected the data, pre-processed the data and carried out benchmark experiments. All authors prepared the figures, wrote the manuscript, critically read the manuscript and approved the final version for submission.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Lin, S., Yang, Y. & Zhao, H. Empowering genetic discoveries and cardiovascular risk assessment by predicting electrocardiograms from genotype. npj Digit. Med. (2026). https://doi.org/10.1038/s41746-026-02438-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41746-026-02438-3


