Abstract
Data-driven methods for electrocardiogram (ECG) interpretation are rapidly progressing. Large datasets have enabled advances in artificial intelligence (AI) based ECG analysis, yet limitations in annotation quality, size, and scope remain major challenges. Here we present CODE-II, a large-scale real-world dataset of 2,735,269 12-lead ECGs from 2,093,807 adult patients collected by the Telehealth Network of Minas Gerais (TNMG), Brazil. Each exam was annotated using standardized diagnostic criteria and reviewed by cardiologists. A defining feature of CODE-II is a set of 66 clinically meaningful diagnostic classes, developed with cardiologist input and routinely used in telehealth practice. We additionally provide an openly available subset: CODE-II-open, a public subset of 15,000 patients, and the CODE-II-test, a non-overlapping set of 8,475 exams reviewed by multiple cardiologists for blinded evaluation. A neural network pre-trained on CODE-II achieved superior transfer performance on external benchmarks (PTB-XL and CPSC 2018) and outperformed alternatives trained on larger datasets.
Similar content being viewed by others
Acknowledgements
The authors thank the Telehealth Network of Minas Gerais for their long-term support for data acquisition and clinical validation, and the cardiologists and healthcare professionals involved in the generating and reviewing the electrocardiographic reports. The authors also acknowledge the institutional support from the Brazilian Health Ministry, participating universities, and research centers, which enabled the development and execution of this study. This work was supported by National Council for Scientific and Technological Development (CNPq), grants 310790/2021-2, 409604/2022-4, 443121/2023-0 and 408659/2024-6; Minas Gerais State Foundation for Research Support (FAPEMIG), grants PPE-00030-21 and RED-00192-23; and the Secretary for Information and Digital Health (SEIDIGI) of the Brazilian Ministry of Health (TEDs 22 and 114/2024). A.H.R. is partially supported by the eSSENCE strategic collaborative research program. A.L.P.R. is supported by the Innovation Center on Artificial Intelligence for Health (CIIA-S) and the Institute for Health Assessment and Translation for Chronic and Neglected Diseases of High Relevance (IATS-CARE). P.E.O.G.B.A. is supported by a CNPq scholarship (Brazil), grants 317219/2023-5, 302087/2024-9 and 201639/2024-6. T.B.S. is partially supported by the Kjell och Märta Beijer Foundation. The funders had no role in the study design, data collection, analysis, interpretation of the results, manuscript preparation, or the decision to submit the manuscript for publication.
Funding
Open access funding provided by Uppsala University.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Competing interests
A.H.R. holds equity options in Einthoven Tecnologia LTDA and serves as a technical advisor for the company. The other authors do not have a competing interest.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Abreu, P.E.O.G.B., Paixão, G.M.M., Li, J. et al. CODE-II: a large-scale dataset for artificial intelligence in ECG analysis. npj Digit. Med. (2026). https://doi.org/10.1038/s41746-026-02704-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41746-026-02704-4

