Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

npj Digital Medicine
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. npj digital medicine
  3. articles
  4. article
Empowering genetic discoveries and cardiovascular risk assessment by predicting electrocardiograms from genotype
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 17 February 2026

Empowering genetic discoveries and cardiovascular risk assessment by predicting electrocardiograms from genotype

  • Siying Lin1,2,
  • Yuedong Yang1 &
  • Huiying Zhao2 

npj Digital Medicine , Article number:  (2026) Cite this article

  • 439 Accesses

  • 1 Altmetric

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Cardiovascular diseases
  • Computational models
  • Genome-wide association studies
  • Machine learning

Abstract

Cardiovascular diseases (CVDs) are the leading cause of death worldwide. To interpret disease mechanisms and warn CVDs in early life, biobanks have emerged to collect genotype and electrocardiogram (ECG) data. However, only 10% of samples contain both genotype data and ECG data in UK-Biobank (UKB), limiting the utility of the biobanks. Here, we have developed an attention-based Capsule Network (CapECG), to predict ECG traits from genotype. CapECG has mapped high dimensional genotype to low dimensional ECG traits and improved the CVDs prediction from genotype. CapECG achieved an average Pearson correlation coefficient (PCC) of 0.62 for 7422 individuals in the internal test set from UKB. The model was used to predict 169 ECG traits for 388,284 individuals containing only genotype data in UKB. The predicted 169 ECG traits were used to assess risks of six types of CVDs, and achieved average area under the curve (AUC) of 0.80, higher than 0.71 provided by the polygenic risk score-based method. Genome-wide association study (GWAS) on the predicted spatial QRS-T angle (spQRSTa) identified 133 significant single nucleotide polymorphisms (SNPs), including 33 overlapping with a published GWAS on 118,780 individuals, surpassing 13 overlaps from observed spQRSTa of 29,692 individuals. Thus, this study proposed a new way to predict ECG traits from genotype and bridge the early prediction of diseases.

Similar content being viewed by others

A genome-wide association and polygenic risk score study on abnormal electrocardiogram in a Chinese population

Article Open access 25 February 2021

Genetic architecture of spatial electrical biomarkers for cardiac arrhythmia and relationship with cardiovascular disease

Article Open access 14 March 2023

AI modeling photoplethysmography to electrocardiography useful for predicting cardiovascular disease

Article Open access 14 December 2025

Data availability

The ECG data are available by request from UK Biobank, see https://www.ukbiobank.ac.uk/enable-your-research/ apply-for-access. The predicted ECG traits are used for predicting CVDs and GWAS. The GWAS summary statistics for the 11 predicted ECG traits, including effect sizes (beta) and standard errors (SE) for all SNP associations, are publicly available at Zenodo: https://zenodo.org/records/13859786.

Code availability

Our framework was implemented by Python 3.9.16 and the Pytorch 1.12.1 library with torch-geometric 2.3.1. The operating system version is Ubuntu 22.04.2. More details can be found in the Methods section, supplementary information and the code repository (https://github.com/biomed-AI/CapECG). The Mendelian randomization (MR) analyses were performed by R packages TwoSampleMR (version 0.5.5) and GSMR (version 1.0.9). The MAGMA analysis was performed by the third party code at https://ctg.cncr.nl/software/magma. The spatial QRS-T angle was measured by the methods published in Nature Communications 202338 and Biomed Signal Process Control 202164. The code is available at https://data.mendeley.com/datasets/3wb7sztrrb/1. Other ECG traits were measured by the methods published in Human Genetics 202466. The codes are available at https://github.com/Qimengling/ecg_traits_extraction.

References

  1. Marijon, E., Garcia, R., Narayanan, K., Karam, N. & Jouven, X. Fighting against sudden cardiac death: need for a paradigm shift-Adding near-term prevention and pre-emptive action to long-term prevention. Eur. Heart J. 43, 1457–1464 (2022).

    Google Scholar 

  2. Miyazawa, K. et al. Cross-ancestry genome-wide analysis of atrial fibrillation unveils disease biology and enables cardioembolic risk prediction. Nat. Genet. 55, 187–197 (2023).

    Google Scholar 

  3. Yuan, S. et al. Deciphering the genetic architecture of atrial fibrillation offers insights into disease prediction, pathophysiology and downstream sequelae. medRxiv, https://doi.org/10.1101/2023.07.20.23292938 (2023).

  4. Tcheandjieu, C. et al. Large-scale genome-wide association study of coronary artery disease in genetically diverse populations. Nat. Med. 28, 1679–1692 (2022).

    Google Scholar 

  5. Chatterjee, N., Shi, J. & García-Closas, M. Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat. Rev. Genet. 17, 392–406 (2016).

    Google Scholar 

  6. AlGhatrif, M. & Lindsay, J. A brief review: history to understand fundamentals of electrocardiography. J. Commun. Hosp. Intern. Med. Perspect. 2, 14383 (2012).

    Google Scholar 

  7. Arking, D. E. et al. Genetic association study of QT interval highlights role for calcium signaling pathways in myocardial repolarization. Nat. Genet. 46, 826–836 (2014).

    Google Scholar 

  8. Young, W. J. et al. Genetic analyses of the electrocardiographic QT interval and its components identify additional loci and pathways. Nat. Commun. 13, 5144 (2022).

    Google Scholar 

  9. Van Setten, J. et al. PR interval genome-wide association meta-analysis identifies 50 loci associated with atrial and atrioventricular electrical activity. Nat. Commun. 9, 2904 (2018).

    Google Scholar 

  10. Ntalla, I. et al. Multi-ancestry GWAS of the electrocardiographic PR interval identifies 202 loci underlying cardiac conduction. Nat. Commun. 11, 2542 (2020).

    Google Scholar 

  11. Prins, B. P. et al. Exome-chip meta-analysis identifies novel loci associated with cardiac conduction, including ADAMTS6. Genome Biol. 19, 1–17 (2018).

    Google Scholar 

  12. Crotti, L. et al. NOS1AP is a genetic modifier of the long-QT syndrome. Circulation 120, 1657–1663 (2009).

    Google Scholar 

  13. Kolder, I. C. et al. Analysis for genetic modifiers of disease severity in patients with long-QT syndrome type 2. Circul. Cardiovasc. Genet. 8, 447–456 (2015).

    Google Scholar 

  14. Nauffal, V. et al. Monogenic and polygenic contributions to QTc prolongation in the population. Circulation 145, 1524–1533 (2022).

    Google Scholar 

  15. Verweij, N. et al. The genetic makeup of the electrocardiogram. Cell Syst. 11, 229–238.e225 (2020).

    Google Scholar 

  16. Glinge, C., Lahrouchi, N., Jabbari, R., Tfelt-Hansen, J. & Bezzina, C. R. Genome-wide association studies of cardiac electrical phenotypes. Cardiovasc. Res. 116, 1620–1634 (2020).

    Google Scholar 

  17. Nolte, I. M. et al. A comparison of heritability estimates by classical twin modeling and based on genome-wide genetic relatedness for cardiac conduction traits. Twin Res. Hum. Genet. 20, 489–498 (2017).

    Google Scholar 

  18. Jamshidi, Y., Nolte, I. M., Spector, T. D. & Snieder, H. Novel genes for QTc interval. How much heritability is explained, and how much is left to find? Genome Med. 2, 35 (2010).

    Google Scholar 

  19. Russell, M. W., Law, I., Sholinsky, P. & Fabsitz, R. R. Heritability of ECG measurements in adult male twins. J. Electrocardiol. 30, 64–68 (1998).

    Google Scholar 

  20. Ornella, L. et al. Genomic-enabled prediction with classification algorithms. Heredity 112, 616–626 (2014).

    Google Scholar 

  21. Liang, Y., Melia, O., Brettin, T., Brown, A. & Im, H. K. BrainXcan identifies brain features associated with behavioral and psychiatric traits using large scale genetic and imaging data. medRxiv, 2021.2006. 2001.21258159 (2021).

  22. Mackay, T. F. Epistasis and quantitative traits: using model organisms to study gene–gene interactions. Nat. Rev. Genet. 15, 22–33 (2014).

    Google Scholar 

  23. Prabhu, S. & Pe’er, I. Ultrafast genome-wide scan for SNP–SNP interactions in common complex disease. Genome Res. 22, 2230–2240 (2012).

    Google Scholar 

  24. Luo, X., Kang, X. & Schönhuth, A. Predicting the prevalence of complex genetic diseases from individual genotype profiles using capsule networks. Nat. Mach. Intell. 5, 114–125 (2023).

    Google Scholar 

  25. van Hilten, A. et al. GenNet framework: interpretable deep learning for predicting phenotypes from genetic data. Commun. Biol. 4, 1094 (2021).

    Google Scholar 

  26. Slatkin, M. Linkage disequilibrium—understanding the evolutionary past and mapping the medical future. Nat. Rev. Genet. 9, 477–485 (2008).

    Google Scholar 

  27. Hinton, G. E., Sabour, S. & Frosst, N. Matrix capsules with EM routing. International conference on learning representations, 2018.

  28. Camacho, D. M., Collins, K. M., Powers, R. K., Costello, J. C. & Collins, J. J. Next-generation machine learning for biological networks. Cell 173, 1581–1592 (2018).

    Google Scholar 

  29. Wang, L. et al. An interpretable deep-learning architecture of capsule networks for identifying cell-type gene expression programs from single-cell RNA-sequencing data. Nat. Mach. Intell. 2, 693–703 (2020).

    Google Scholar 

  30. Zhang, Z. et al. CapsNet-LDA: predicting lncRNA-disease associations using attention mechanism and capsule network based on multi-view data. Brief. Bioinforma. 24, bbac531 (2023).

    Google Scholar 

  31. Sabour, S., Frosst, N. & Hinton, G. E. Dynamic routing between capsules. Adv. Neural Inform. Process. Syst. 30 (2017).

  32. Bland, J. M. & Altman, D. G. Measuring agreement in method comparison studies. Stat. methods Med. Res. 8, 135–160 (1999).

    Google Scholar 

  33. Vaswani, A. et al. Attention is all you need. Adv. Neural Inform. Process. Syst. 30, 5998–6008 (2017).

    Google Scholar 

  34. Wang, W. & Li, B. A novel model based on a 1D-ResCNN and transfer learning for processing EEG attenuation. Comput. Methods Biomech. Biomed. Eng. 26, 1980–1993 (2023).

    Google Scholar 

  35. Chen, T. & Guestrin, C. in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 785–794.

  36. Sakaue, S. et al. Dimensionality reduction reveals fine-scale structure in the Japanese population with consequences for polygenic risk prediction. Nat. Commun. 11, 1569 (2020).

    Google Scholar 

  37. Ni, G. et al. Estimation of genetic correlation via linkage disequilibrium score regression and genomic restricted maximum likelihood. Am. J. Hum. Genet. 102, 1185–1194 (2018).

    Google Scholar 

  38. Young, W. J. et al. Genetic architecture of spatial electrical biomarkers for cardiac arrhythmia and relationship with cardiovascular disease. Nat. Commun. 14, 1411 (2023).

    Google Scholar 

  39. Wharrie, S. et al. HAPNEST: efficient, large-scale generation and evaluation of synthetic datasets for genotypes and phenotypes. bioinformatics 39, btad535 (2023).

    Google Scholar 

  40. Consortium, G. P. A map of human genome variation from population scale sequencing. Nature 467, 1061 (2010).

    Google Scholar 

  41. Hoffmann, T. J. et al. A large genome-wide association study of QT interval length utilizing electronic health records. Genetics 222, https://doi.org/10.1093/genetics/iyac157 (2022).

  42. De Leeuw, C. A., Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219 (2015).

    Google Scholar 

  43. Edwards, J. J. et al. Impact of fetal hypoxia on myocardial molecular profile in the lamb extra-uterine environment for neonatal development. Circulation 146, A11646–A11646 (2022).

    Google Scholar 

  44. Hartiala, J. A. et al. Genome-wide analysis identifies novel susceptibility loci for myocardial infarction. Eur. Heart J. 42, 919–933 (2021).

    Google Scholar 

  45. Reimand, J. et al. g: Profiler—a web server for functional interpretation of gene lists (2016 update). Nucleic Acids Res. 44, W83–W89 (2016).

    Google Scholar 

  46. van der Maarel, L. E., Postma, A. V. & Christoffels, V. M. Genetics of sinoatrial node function and heart rate disorders. Dis. Models Mech. 16, dmm050101 (2023).

  47. Cheung, C. Y. et al. A deep-learning system for the assessment of cardiovascular disease risk via the measurement of retinal-vessel calibre. Nat. Biomed. Eng. 5, 498–508 (2021).

    Google Scholar 

  48. Vitale, G. et al. Standard ECG for differential diagnosis between Anderson-Fabry disease and hypertrophic cardiomyopathy. Heart 108, 54–60 (2022).

    Google Scholar 

  49. Koechlin, L. et al. Hyperacute T wave in the early diagnosis of acute myocardial infarction. Ann. Emerg. Med 82, 194–202 (2023).

    Google Scholar 

  50. Sau, A. et al. Artificial intelligence-enabled electrocardiogram for mortality and cardiovascular risk estimation: a model development and validation study. Lancet Digit Health 6, e791–e802 (2024).

    Google Scholar 

  51. Jensen, K. et al. Bringing critical race praxis into the study of electrophysiological substrate of sudden cardiac death: the ARIC study. J. Am. Heart Assoc. 9, e015012 (2020).

    Google Scholar 

  52. Zhang, X. et al. Spatial/frontal QRS-T angle predicts all-cause mortality and cardiac mortality: a meta-analysis. PloS One 10, e0136174 (2015).

    Google Scholar 

  53. Tanigawa, Y. et al. Significant sparse polygenic risk scores across 813 traits in UK Biobank. PLoS Genet 18, e1010105 (2022).

    Google Scholar 

  54. An, U. et al. Deep learning-based phenotype imputation on population-scale biobank data increases genetic discoveries. Nat. Genet 55, 2269–2276 (2023).

    Google Scholar 

  55. Calus, M. P. & Vandenplas, J. SNPrune: an efficient algorithm to prune large SNP array and sequence datasets based on high linkage disequilibrium. Genet. Select. Evol. 50, 1–11 (2018).

    Google Scholar 

  56. Hibar, D. P. et al. Voxelwise gene-wide association study (vGeneWAS): multivariate gene-based association testing in 731 elderly subjects. Neuroimage 56, 1875–1891 (2011).

    Google Scholar 

  57. He, D. et al. Accurate identification of genes associated with brain disorders by integrating heterogeneous genomic data into a Bayesian framework. EBioMedicine 107, 105286 (2024).

  58. Lopes, L. R. et al. Alpha-protein kinase 3 (ALPK3) truncating variants are a cause of autosomal dominant hypertrophic cardiomyopathy. Eur. Heart J. 42, 3063–3073 (2021).

    Google Scholar 

  59. Kan-o, M. et al. Mammalian formin Fhod3 plays an essential role in cardiogenesis by organizing myofibrillogenesis. Biol. Open 1, 889–896 (2012).

    Google Scholar 

  60. Fujimoto, N. et al. Transgenic expression of the formin protein fhod3 selectively in the embryonic heart: role of actin-binding activity of fhod3 and its sarcomeric localization during myofibrillogenesis. PLoS One 11, e0148472 (2016).

    Google Scholar 

  61. Li, C. et al. Genomic innovation in early life cardiovascular disease prevention and treatment. Circ. Res 132, 1628–1647 (2023).

    Google Scholar 

  62. Alanis-Lobato, G., Cannistraci, C. V., Eriksson, A., Manica, A. & Ravasi, T. Highlighting nonlinear patterns in population genetics datasets. Sci. Rep. 5, 8140 (2015).

    Google Scholar 

  63. Taş, G. et al. Computing linkage disequilibrium aware genome embeddings using autoencoders. Bioinformatics btae, 326 (2024).

    Google Scholar 

  64. Young, W. J. et al. A method to minimise the impact of ECG marker inaccuracies on the spatial QRS-T angle: evaluation on 1,512 manually annotated ECGs. Biomed. Signal Process. Control 64, 102305 (2021).

    Google Scholar 

  65. Wang, X., Qi, M., Zhang, H., Yang, Y. & Zhao, H. Genome-wide association and Mendelian randomization analysis provide insights into the shared genetic architecture between high-dimensional electrocardiographic features and ischemic heart disease. Hum. Genet 143, 49–58 (2024).

    Google Scholar 

  66. Qi, M. et al. Genetic evidence for T-wave area from 12-lead electrocardiograms to monitor cardiovascular diseases in patients taking diabetes medications. Human Genetics 143, 1095–1108 (2024).

    Google Scholar 

  67. Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).

    Google Scholar 

  68. Loh, P.-R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1443–1448 (2016).

    Google Scholar 

  69. International HapMap, C. et al. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).

    Google Scholar 

  70. Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).

    Google Scholar 

  71. Kinga, D. & Adam, J. B. A method for stochastic optimization. International conference on learning representations, Vol. 5, (2015).

  72. He, K., Zhang, X., Ren, S. & Sun, J. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.

  73. Genomes Project, C. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Google Scholar 

Download references

Acknowledgements

The work was funded by the National Key Research and Development Program of China (2023YFF1204902), the Natural Science Foundation of China (82371482), Guangdong Basic and Applied Basic Research Foundation (2024B1515040001), Guangzhou Science and Technology Research Plan (2023A03J0659), Natural Science Foundation of Guangdong (2024A1515011363) and Guangzhou Key Laboratory of Artificial Intelligence (2024A03J0847) for data analysis and study design.

Author information

Authors and Affiliations

  1. School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China

    Siying Lin & Yuedong Yang

  2. Department of Medical Research Center, Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, Guangzhou, China

    Siying Lin & Huiying Zhao

Authors
  1. Siying Lin
    View author publications

    Search author on:PubMed Google Scholar

  2. Yuedong Yang
    View author publications

    Search author on:PubMed Google Scholar

  3. Huiying Zhao
    View author publications

    Search author on:PubMed Google Scholar

Contributions

H.Z. initialized the idea. All authors refined the experimental setup. S.L. designed the algorithm and built the model. S.L. collected the data, pre-processed the data and carried out benchmark experiments. All authors prepared the figures, wrote the manuscript, critically read the manuscript and approved the final version for submission.

Corresponding authors

Correspondence to Yuedong Yang or Huiying Zhao.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Material

supplementary tables

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, S., Yang, Y. & Zhao, H. Empowering genetic discoveries and cardiovascular risk assessment by predicting electrocardiograms from genotype. npj Digit. Med. (2026). https://doi.org/10.1038/s41746-026-02438-3

Download citation

  • Received: 08 May 2025

  • Accepted: 05 February 2026

  • Published: 17 February 2026

  • DOI: https://doi.org/10.1038/s41746-026-02438-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Associated content

Collection

Emerging Applications of Machine Learning and AI for Predictive Modeling in Precision Medicine

Advertisement

Explore content

  • Research articles
  • Reviews & Analysis
  • News & Comment
  • Collections
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Content types
  • Journal Information
  • About the Editors
  • Contact
  • Editorial policies
  • Calls for Papers
  • Journal Metrics
  • About the Partner
  • Open Access
  • Early Career Researcher Editorial Fellowship
  • Editorial Team Vacancies
  • News and Views Student Editor
  • Communication Fellowship

Publish with us

  • For Authors and Referees
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

npj Digital Medicine (npj Digit. Med.)

ISSN 2398-6352 (online)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing