Abstract
Plant genetic resources are considered a treasure trove of valuable, untapped diversity that holds the key to breeding the crops of the future. However, the use of these resources in breeding is often limited due to the lack of comprehensive phenotypic characterization. The present study provides extensive historical phenotypic data from nine genebanks as a MIAPPE compliant data set. We compiled and curated phenotypic data from 43,293 wheat accessions, encompassing 460,399 data points across 52 traits, including the three core traits of plant height, heading time, and thousand kernel weight from seven decades. The exceptional quality of the presented dataset was highlighted by predominantly high heritabilities. Phenotypic data of such quantity and quality is a crucial resource for unlocking the valuable diversity of plant genetic resources for agricultural advancement.
Similar content being viewed by others
Data availability
The dataset is available both in a MIAPPE compliant tabular excel format at Recherche Data Gouv29 (https://doi.org/10.57745/Y1VWIG, version 1.2), and as a FAIR Digital Object (FDO) in the Annotated Research Context (ARC) format at e!DAL-PGP30 (https://doi.org/10.5447/IPK/2025/11, revision 2). Associated genotypic data was generated for a part of the observed germplasms. The raw sequence data is available at ENA35 (https://www.ebi.ac.uk/ena/browser/view/PRJEB49199, https://www.ebi.ac.uk/ena/browser/view/PRJEB81686). The variant data have been deposited in EVA36 (https://www.ebi.ac.uk/eva/?eva-study=PRJEB106343).
Code availability
The code is available on GitHub (https://github.com/AGENTproject/historic_pheno_data_analysis/), Zenodo (https://doi.org/10.5281/zenodo.15343013), and is archived in Software Heritage (https://archive.softwareheritage.org/swh:1:dir:a95895b0bfdbf346dbb31d656c9ef446c056caec). The code and artifacts from this repository are licensed under the permissive CC-BY-4.0, which allows any kind of reuse with the only condition of giving credit. The code is organised in one Jupyter R40 notebook per crop and per genebank. The computational environment is based on Conda packages. The main packages used are the tidyverse41 meta-package from conda-forge42, bioconductor-multtest43 from bioconda44, and the asreml45 proprietary R package. A full description of the computational environment is included in the repository, to enable reproducibility of the results.
References
Milner, S. G. et al. Genebank genomics highlights the diversity of a global barley collection. Nat. Genet. 51, 319–326, https://doi.org/10.1038/s41588-018-0266-x (2019).
Mascher, M. et al. Genebank genomics bridges the gap between the conservation of crop diversity and plant breeding. Nat. Genet. 51, 1076–1081, https://doi.org/10.1038/s41588-019-0443-6 (2019).
McCouch, S. et al. Feeding the future. Nature 499, 23–24, https://doi.org/10.1038/499023a (2013).
Schulthess, A. W. et al. Large-scale genotyping and phenotyping of a worldwide winter wheat genebank for its use in pre-breeding. Sci. Data 9, 784, https://doi.org/10.1038/s41597-022-01891-5 (2022).
Gonzalez, M. Y. et al. Unbalanced historical phenotypic data from seed regeneration of a barley ex situ collection. Sci. Data 5, 180278, https://doi.org/10.1038/sdata.2018.278 (2018).
Philipp, N. et al. Historical phenotypic data from seven decades of seed regeneration in a wheat ex situ collection. Sci. Data 6, 137, https://doi.org/10.1038/s41597-019-0146-y (2019).
Singh, S. et al. Direct introgression of untapped diversity into elite wheat lines. Nat. Food 2, 819–827, https://doi.org/10.1038/s43016-021-00380-z (2021).
Schulthess, A. W. et al. Genomics-informed prebreeding unlocks the diversity in genebanks for wheat improvement. Nat. Genet. 54, 1544–1552, https://doi.org/10.1038/s41588-022-01189-7 (2022).
Gaurav, K. et al. Population genomic analysis of Aegilops tauschii identifies targets for bread wheat improvement. Nat. Biotechnol. 40, 422–431, https://doi.org/10.1038/s41587-021-01058-4 (2022).
Börner, A. Preservation of plant genetic resources in the biotechnology era. Biotechnol. J. 1, 1393–1404, https://doi.org/10.1002/biot.200600131 (2006).
Alercia, A., Diulgheroff, S. & Mackay, M. FAO/Bioversity Multi-Crop Passport Descriptors V.2.1 [MCPD V.2.1]. https://hdl.handle.net/10568/69166 (2015).
Genesys PGR. https://www.genesys-pgr.org/.
Kotni, P., van Hintum, T., Maggioni, L., Oppermann, M. & Weise, S. EURISCO update 2023: the European Search Catalogue for Plant Genetic Resources, a pillar for documentation of genebank material. Nucleic Acids Res. 51, D1465–D1469, https://doi.org/10.1093/nar/gkac852 (2023).
International Board for Plant Genetic Resources (IBPGR) & Commission of the European Communities (CEC). Descriptors for Wheat (Revised). https://cropgenebank.sgrp.cgiar.org/images/file/learning_space/descriptors_wheat.pdf (1985).
Papoutsoglou, E. A. et al. Enabling reusability of plant phenomic datasets with MIAPPE 1.1. New Phytol. 227, 260–273, https://doi.org/10.1111/nph.16544 (2020).
Pommier, C. et al. Applying FAIR Principles to Plant Phenotypic Data Management in GnpIS. Plant Phenomics 2019, 1671403, https://doi.org/10.34133/2019/1671403 (2019).
Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3, 160018, https://doi.org/10.1038/sdata.2016.18 (2016).
EURISCO. http://eurisco.ecpgr.org.
AGENT – Activated GEnebank NeTwork. https://www.agent-project.eu/.
H2020 AGENT Project Fact Sheet. CORDIS, European Commission https://doi.org/10.3030/862613.
The AGENT consortium et al. AGENT Guidelines for dataflow. https://doi.org/10.5281/zenodo.12625359 (2024).
WIEWS - World Information and Early Warning System on Plant Genetic Resources for Food and Agriculture. https://www.fao.org/wiews.
AGENT portal. https://agent.ipk-gatersleben.de.
Wolstencroft, K. et al. FAIRDOMHub: a repository and collaboration environment for sharing systems biology research. Nucleic Acids Res. 45, D404–D407, https://doi.org/10.1093/nar/gkw1032 (2017).
Brouwer, M. matthijsbrouwer/excel-validator. https://pypi.org/project/excel-validator/ (2025).
Philipp, N. et al. Leveraging the Use of Historical Data Gathered During Seed Regeneration of an ex Situ Genebank Collection of Wheat. Front. Plant Sci. 9, 609, https://doi.org/10.3389/fpls.2018.00609 (2018).
Nobre, J. S. & Singer, J. M. Leverage analysis for linear mixed models. J. Appl. Stat. 38, 1063–1072, https://doi.org/10.1080/02664761003759016 (2011).
Holm, S. A Simple Sequentially Rejective Multiple Test Procedure. Scand. J. Stat. 6, 65–70 (1979).
Bardet, E., Floch, E. L., Etukala, J. R. & Pommier, C. Wheat historical phenotypic data from 9 European genebanks. Recherche Data Gouv https://doi.org/10.57745/Y1VWIG.
Lange, M., Le Floch, E., Pachipala, M. & Etukala, J. R. Curated wheat historical phenotypic data from European Genebanks. https://doi.org/10.5447/IPK/2025/11.
DataPLANT Community. Annotated Research Context Specification, v1.1-rfc. Zenodo https://doi.org/10.5281/ZENODO.8302662 (2023).
Arend, D., König, P., Junker, A., Scholz, U. & Lange, M. The on-premise data sharing infrastructure e!DAL: Foster FAIR data for faster data acquisition. GigaScience 9, giaa107, https://doi.org/10.1093/gigascience/giaa107 (2020).
Weil, H. L. et al. PLANTdataHUB: a collaborative platform for continuous FAIR data sharing in plant research. Plant J. 116, 974–988, https://doi.org/10.1111/tpj.16474 (2023).
Sansone, S.-A., Rocca-Serra, P., Gonzalez-Beltran, A., Johnson, D. & ISA Community. Isa Model And Serialization Specifications 1.0. https://doi.org/10.5281/ZENODO.163640 (2016).
Leinonen, R. et al. The European Nucleotide Archive. Nucleic Acids Res. 39, D28–D31, https://doi.org/10.1093/nar/gkq967 (2011).
Cezard, T. et al. The European Variation Archive: a FAIR resource of genomic variation for all species. Nucleic Acids Res. 50, D1216–D1220, https://doi.org/10.1093/nar/gkab960 (2022).
Alaux, M., Dyer, S. & Sen, T. Z. Wheat Data Integration and FAIRification: IWGSC, GrainGenes, Ensembl and Other Data Repositories. in The Wheat Genome (eds Appels, R., Eversole, K., Feuillet, C. & Gallagher, D.) 13–25. https://doi.org/10.1007/978-3-031-38294-9_2 (Springer International Publishing, Cham, 2024).
Pommier, C. et al. Reassessing data management in increasingly complex phenotypic datasets. Trends Plant Sci. S1360138525002638 https://doi.org/10.1016/j.tplants.2025.09.001 (2025).
Singh, N. et al. Efficient curation of genebanks using next generation sequencing reveals substantial duplication of germplasm accessions. Sci. Rep. 9, 650, https://doi.org/10.1038/s41598-018-37269-0 (2019).
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing (2024).
Wickham, H. et al. Welcome to the Tidyverse. J. Open Source Softw. 4, 1686, https://doi.org/10.21105/joss.01686 (2019).
Conda-Forge Community. The conda-forge Project: Community-based Software Distribution Built on the conda Package Format and Ecosystem. https://doi.org/10.5281/zenodo.4774216 (2015).
Pollard, K. S., Dudoit, S. & Van Der Laan, M. J. Multiple Testing Procedures: the multtest Package and Applications to Genomics. in Bioinformatics and Computational Biology Solutions Using R and Bioconductor (eds Gentleman, R., Carey, V. J., Huber, W., Irizarry, R. A. & Dudoit, S.) 249–271. https://doi.org/10.1007/0-387-29362-0_15 (Springer New York, New York, NY, 2005).
Grüning, B. et al. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat. Methods 15, 475–476, https://doi.org/10.1038/s41592-018-0046-7 (2018).
The VSNi Team. asreml: Fits Linear Mixed Models using REML (2023).
PlantBioinfoPF. Plant Bioinformatics Facility. https://doi.org/10.15454/1.5572414581735654E12.
Acknowledgements
We are sincerely grateful to Hannah F. Oertel (IPK Gatersleben), for her support on biometrics related matters. This study was funded by the AGENT project, which received funding from the European Union’s Horizon 2020 research and innovation program under Grant Agreement No. 862613. This work was performed using services provided by the URGI bioinformatics facility46.
Author information
Authors and Affiliations
Contributions
All partner genebanks did the historic data digitization and curation. E.L.F. did the data validation, supported by C.P. and S.W. for data collection templates definition, M.B. and R.D.B. for the Excel validator, J.R. for the bioinformatics helpdesk, S.W. and S.K. for accession data validation. E.L.F. did the analysis as an IPK visiting scientist supported by M.O.B. and J.C.R. The CREA-CI (P.V., F.S.) and INIA (D.M.L., M.P.B., M.R.) genebanks volunteered as beta testers for the analysis workflow and provided practical help with data curation. E.L.F. and J.C.R. wrote the manuscript. M.O.B. wrote the Data analyses section. H.F.O. provided biometrics support. Each genebank described their collection in the Plant material section. C.P., M.A., S.W., M.L. contributed to the section presenting the AGENT project phenotyping data flow. M.L, M.P. and J.R.E. integrated and curated the data as a single ISA compliant FAIR Digital Object and published it in e!DAL-PGP. E.B. helped with data curation and publication to recherche.data.gouv.fr. J.C.R. revised the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Le Floch, E., Adam-Blondon, AF., Alaux, M. et al. Wheat historical phenotypic data from European genebanks as an important resource for research and breeding. Sci Data (2026). https://doi.org/10.1038/s41597-026-06908-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-026-06908-x


