Abstract
The ERA5 reanalysis dataset, developed by the European Centre for Medium-Range Weather Forecasts (ECMWF), provides high-resolution, hourly global climate and weather data from 1950 to the present. However, its massive volume poses substantial storage and distribution challenges. To address this, we introduce CRA5, a highly compressed version of ERA5 generated by the neural network framework Aeolus. CRA5 reduces the 400 TB uncompressed float32 dataset to just 0.85 TB, achieving a 470× compression ratio. Notably, it offers over 100 times higher compression than the lossless GRIB files from the Copernicus Climate Data Store (CDS). Extensive experiments validate its numerical accuracy: CRA5 maintains consistent climatology and comparable power spectral density, yielding a mean absolute error of only 0.17 K for temperature across 37 vertical levels. Furthermore, it faithfully reconstructs extreme weather events and large-scale climatological patterns. By significantly lowering infrastructure barriers, CRA5 accelerates data access and facilitates broader collaboration in large-scale atmospheric research.
Similar content being viewed by others
Acknowledgements
This research was supported by funding from the Hong Kong RGC General Research Fund (152228/23E, 162161/24E, 162116/25E, and 162180/25E), the National Natural Science Foundation of China (NSFC) Key Program (No. 62532005), the Collaborative Research Fund (Nos. C1042-23GF and C5097-25G), the NSFC/RGC Collaborative Research Scheme (Grant Nos. 62461160332 and CRS_HKUST602/24), the Research Impact Fund (No. R5011-23F), the Areas of Excellence Scheme (AoE/E-601/22-R), and InnoHK (HKGAI). This work was also supported by the JC STEM Lab of AI for Science and Engineering, funded by The Hong Kong Jockey Club Charities Trust; the MTR Research Funding (MRF) Scheme (CHU-24003); and the Research Grants Council of Hong Kong (Project No. CUHK14213224).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Han, T., Du, D., Chen, Z. et al. CRA5 a high-fidelity compressed reanalysis atmospheric dataset for weather and climate research. Sci Data (2026). https://doi.org/10.1038/s41597-026-07381-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-026-07381-2


