Abstract
Artificial intelligence (AI) methods are powerful tools for biological image analysis and processing. High-quality annotated images are key to training and developing new algorithms, but access to such data is often hindered by the lack of standards for sharing datasets. We discuss the barriers to sharing annotated image datasets and suggest specific guidelines to improve the reuse of bioimages and annotations for AI applications. These include standards on data formats, metadata, data presentation and sharing, and incentives to generate new datasets. We are sure that the Metadata, Incentives, Formats and Accessibility (MIFA) recommendations will accelerate the development of AI tools for bioimage analysis by facilitating access to high-quality training and benchmarking data.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout



Similar content being viewed by others
Data availability
Data availability is not applicable to this article as no new data were created or analyzed for this work. All used images are already publicly accessible, permissibly licensed and referenced by identifier.
References
Zhang, K., Pintilie, G. D., Li, S., Schmid, M. F. & Chiu, W. Resolving individual atoms of protein complex by cryo-electron microscopy. Cell Res. 30, 1136–1139 (2020).
Nakane, T. et al. Single-particle cryo-EM at atomic resolution. Nature 587, 152–156 (2020).
Yip, K. M., Fischer, N., Paknia, E., Chari, A. & Stark, H. Atomic-resolution protein structure determination by cryo-EM. Nature 587, 157–161 (2020).
Betzig, E. et al. Imaging intracellular fluorescent proteins at nanometer resolution. Science 313, 1642–1645 (2006).
Rust, M. J., Bates, M. & Zhuang, X. Sub-diffraction-limit imaging by stochastic optical reconstruction microscopy (STORM). Nat. Methods 3, 793–795 (2006).
Hess, S. T., Girirajan, T. P. K. & Mason, M. D. Ultra-high resolution imaging by fluorescence photoactivation localization microscopy. Biophys. J. 91, 4258–4272 (2006).
Megason, S. G. In toto imaging of embryogenesis with confocal time-lapse microscopy. Methods Mol. Biol. 546, 317–332 (2009).
McDole, K. et al. In toto imaging and reconstruction of post-implantation mouse development at the single-cell level. Cell 175, 859–876 (2018).
Daetwyler, S., Günther, U., Modes, C. D., Harrington, K. & Huisken, J. Multi-sample SPIM image acquisition, processing and analysis of vascular growth in zebrafish. Development 146, dev173757 (2019).
Chen, B. -C. et al. Lattice light-sheet microscopy: imaging molecules to embryos at high spatiotemporal resolution. Science 346, 1257998 (2014).
Huisken, J., Swoger, J., Del Bene, F., Wittbrodt, J. & Stelzer, E. H. K. Optical sectioning deep inside live embryos by selective plane illumination microscopy. Science 305, 1007–1009 (2004).
Udan, R. S., Piazza, V. G., Hsu, C. -W., Hadjantonakis, A. -K. & Dickinson, M. E. Quantitative imaging of cell dynamics in mouse embryos using light-sheet microscopy. Development 141, 4406–4414 (2014).
Royer, L. A. et al. Adaptive light-sheet microscopy for long-term, high-resolution imaging in living organisms. Nat. Biotechnol. 34, 1267–1278 (2016).
Moen, E. et al. Deep learning for cellular image analysis. Nat. Methods 16, 1233–1246 (2019).
Hallou, A., Yevick, H. G., Dumitrascu, B. & Uhlmann, V. Deep learning for bioimage analysis in developmental biology. Development 148, dev199616 (2021).
Gupta, A. et al. Deep learning in image cytometry: a review. Cytometry A 95, 366–380 (2019).
Villoutreix, P. What machine learning can do for developmental biology. Development 148, dev188474 (2021).
Wang, S., Yang, D. M., Rong, R., Zhan, X. & Xiao, G. Pathology image analysis using segmentation deep learning algorithms. Am. J. Pathol. 189, 1686–1698 (2019).
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016). This publication introduces the FAIR principles, explains their rationale and highlights example implementations.
Rutschi, C., Berente, N. & Nwanganga, F. Data sensitivity and domain specificity in reuse of machine learning applications. Inf. Syst. Front. https://doi.org/10.1007/s10796-023-10388-4 (2023).
Laine, R. F., Arganda-Carreras, I., Henriques, R. & Jacquemet, G. Avoiding a replication crisis in deep-learning-based bioimage analysis. Nat. Methods 18, 1136–1144 (2021). This Comment highlights important considerations for researchers to ensure reproducibility when publishing studies using deep learning in microscopy, including validation methods, tool selection, data practices and reporting standards.
Boehm, U. et al. QUAREP-LiMi: a community endeavor to advance quality assessment and reproducibility in light microscopy. Nat. Methods 18, 1423–1426 (2021).
Swedlow, J. R. et al. A global view of standards for open image data formats and repositories. Nat. Methods 18, 1440–1446 (2021).
Linkert, M. et al. Metadata matters: access to image data in the real world. J. Cell Biol. 189, 777–782 (2010).
Sarkans, U. et al. REMBI: REcommended Metadata for Biological Images—enabling reuse of microscopy data in biology. Nat. Methods 18, 1418–1422 (2021). This article introduces the REMBI guidelines aimed to maximize the reuse of biological images across diverse imaging communities.
Schapiro, D. et al. MITI minimum information guidelines for highly multiplexed tissue images. Nat. Methods 19, 262–267 (2022).
Schwendy, M., Unger, R. E. & Parekh, S. H. EVICAN-a balanced dataset for algorithm development in cell and nucleus segmentation. Bioinformatics 36, 3863–3870 (2020).
Stringer, C., Wang, T., Michaelos, M. & Pachitariu, M. Cellpose: a generalist algorithm for cellular segmentation. Nat. Methods 18, 100–106 (2021).
Edlund, C. et al. LIVECell—a large-scale dataset for label-free live cell segmentation. Nat. Methods 18, 1038–1045 (2021).
Conrad, R. & Narayan, K. CEM500K, a large-scale heterogeneous unlabeled cellular electron microscopy image dataset for deep learning. Elife 10, e65894 (2021).
Conrad, R. & Narayan, K. Instance segmentation of mitochondria in electron microscopy images with a generalist deep learning model trained on a diverse dataset. Cell Syst. 14, 58–71 (2023).
Caicedo, J. C. et al. Nucleus segmentation across imaging experiments: the 2018 Data Science Bowl. Nat. Methods 16, 1247–1253 (2019).
Ulman, V. et al. An objective comparison of cell-tracking algorithms. Nat. Methods 14, 1141–1152 (2017).
Maška, M. et al. The Cell Tracking Challenge: 10 years of objective benchmarking. Nat. Methods https://doi.org/10.1038/s41592-023-01879-y (2023).
Ljosa, V., Sokolnicki, K. L. & Carpenter, A. E. Annotated high-throughput microscopy image sets for validation. Nat. Methods 9, 637 (2012).
Iudin, A. et al. EMPIAR: the Electron Microscopy Public Image Archive. Nucleic Acids Res. 51, D1503–D1511 (2023).
Hartley, M. et al. The BioImage Archive—building a home for life-sciences microscopy data. J. Mol. Biol. 434, 167505 (2022).
Bard, J. B. L. & Rhee, S. Y. Ontologies in biology: design, applications and future challenges. Nat. Rev. Genet. 5, 213–222 (2004).
Creative Commons—CC0. Creative Commons https://creativecommons.org/share-your-work/public-domain/cc0/ (2009).
Creative Commons—Attribution 4.0 International—CC BY 4.0. https://creativecommons.org/licenses/by/4.0/
Ouyang, W. et al. BioImage Model Zoo: a community-driven resource for accessible deep learning in bioimage analysis. Preprint at bioRxiv https://doi.org/10.1101/2022.06.07.495102 (2022).
Virshup, I., Rybakov, S., Theis, F. J., Angerer, P. & Alexander Wolf, F. anndata: access and store annotated data matrices. J. Open Source Softw. 9, 4371 (2024).
Redmon, J., Divvala, S., Girshick, R. & Farhadi, A. You only look once: unified, real-time object detection. in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) https://doi.org/10.1109/cvpr.2016.91 (IEEE, 2016).
Stelzer, E. H. K. et al. Light sheet fluorescence microscopy. Nat. Rev. Methods Primers 1, 1–25 (2021).
Peddie, C. J. et al. Volume electron microscopy. Nat. Rev. Methods Primers 2, 1–23 (2022).
Moore, J. et al. OME-Zarr: a cloud-optimized bioimaging file format with international community support. Histochem. Cell Biol. 160, 223–251 (2023). This work introduces the cloud-optimized file format OME-Zarr, which aims to improve FAIR data access and unify file standards across fields to support efficient data management and analysis.
Marconato, L. et al. SpatialData: an open and universal data framework for spatial omics. Nat. Methods https://doi.org/10.1038/s41592-024-02212-x (2024).
Butler, H. et al. The GeoJSON Format, RFC 7946. https://doi.org/10.17487/rfc7946 (2016).
Lin, T. -Y. et al. Microsoft COCO: Common Objects in Context. Preprint at https://arxiv.org/abs/1405.0312 (2014).
Data sharing is the future. Nat. Methods 20, 471 (2023).
Kaiser, J. & Brainard, J. Ready, set, share! Science 379, 322–325 (2023).
Sever, R. We need a plan D. Nat. Methods 20, 473–474 (2023).
Uhlmann, V., Hartley, M., Moore, J., Weisbart, E. & Zaritsky, A. Making the most of bioimaging data through interdisciplinary interactions. J. Cell Sci. 137, jcs262139 (2024). This article examines key players in the bioimaging field, highlights barriers to interdisciplinary interaction and proposes actions to foster a culture of open data sharing to drive innovation.
Jing, L. & Tian, Y. Self-supervised visual feature learning with deep neural networks: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 43, 4037–4058 (2021).
Bekkhus, T. et al. Remodeling of the lymph node high endothelial venules reflects tumor invasiveness in breast cancer and is associated with dysregulation of perivascular stromal cells. Cancers 13, 211 (2021).
Rangan, R. et al. CryoDRGN-ET: deep reconstructing generative networks for visualizing dynamic biomolecules inside cells. Nat. Methods 21, 1537–1545 (2024).
Galimov, E. & Yakimovich, A. A tandem segmentation-classification approach for the localization of morphological predictors of lifespan and motility. Aging 14, 1665–1677 (2022).
Vijayan, A. et al. The annotation and analysis of complex 3D plant organs using 3DCoordX. Plant Physiol. 189, 1278–1295 (2022).
Jones, R. A., Renshaw, M. J., Barry, D. J. & Smith, J. C. Automated staging of zebrafish embryos using machine learning. Wellcome Open Res. 7, 275 (2022).
Rappez, L., Rakhlin, A., Rigopoulos, A., Nikolenko, S. & Alexandrov, T. DeepCycle reconstructs a cyclic cell cycle trajectory from unsegmented cell images using convolutional neural networks. Mol. Syst. Biol. 16, e9474 (2020).
Kromp, F. et al. An annotated fluorescence image dataset for training nuclear segmentation methods. Sci. Data 7, 262 (2020).
Acknowledgements
To improve support for image annotations of AI-related datasets and to develop annotation standards for the community, a workshop was held with 45 community experts from various backgrounds, including data generators, annotators, curators, AI researchers, bioimage analysts and software developers. The workshop sessions resulted in a series of recommendations on four main topics: Metadata, Incentives, Formats and Accessibility (MIFA), which are described above. We are grateful to the FAIR AI workshop participants F. Ballllosera, A. Bhardwaj, J. -M. Burel, A. French, M. Hammer, D. Hensen, K. Ho, S. Jasek, I. Kemmer, J. Kriel, A. Iudin, W. Ouyang, A. Papaleo, A. Rupaningal, C. Strambio De Castillia, B. Wester, S. Weyand and G. Zaki for their insightful inputs and valuable contributions to the discussion.
The workshop was organized in the framework of the AI4Life project, which has received funding from the European Union’s Horizon Europe Research and Innovation Programme under grant agreement no. 101057970. Views and opinions expressed are, however, those of the authors only and do not necessarily reflect those of the European Union. Neither the European Union nor the granting authority can be held responsible for them. This project has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under contract no. 75N91019D00024. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products or organizations imply endorsement by the US Government.
J.M. is supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation; 501864659 as part of NFDI4BIOIMAGE). M.L.J. is supported by the Francis Crick Institute, which receives its core funding from Cancer Research UK (CC1076), the UK Medical Research Council (CC1076) and the Wellcome Trust (CC1076). G.J.K. and P.K. were supported by EMBL-EBI and the Wellcome Trust (221371/Z/20/Z). N.N. acknowledges support from the Swedish Research Council (2023-05450), Sigurd and Elsa Goljes Memorial Foundation, IngaBritt och Arne Lundberg foundation, Magnus Bergvall Foundation, and Greta and Johan Kock’s foundations. V. Ulman was supported by the Ministry of Education, Youth and Sports of the Czech Republic through the e-INFRA CZ (90254). C.T. and V. Ulman are supported by grant nos. 2020-225265 and 2024-342803, respectively, from the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation. P.P.-G. is a member of the national infrastructure France-Bioimaging supported by the French national research agency (ANR-24-INBS-0005 FBI BIOGEN). A.M.-B. is supported by Ministerio de Ciencia, Innovación y Universidades, Agencia Estatal de Investigación (MCIN/AEI/10.13039/501100011033/), under grant PID2023-152631OB-I00.
Author information
Authors and Affiliations
Contributions
All authors attended the workshop and participated in the plenary and breakout room discussions. T.Z.-C. and M.H. led the writing process. F.J., A.M., J.M. and A.M.-B. (in alphabetical order) acted as chairs of the breakout rooms and summarized discussions. A.M., J.M., A.M.-B., L.A., K.B., P.B., P.G., N.G., M.L.J., G.J.K., P.K., A.K., A.K.Y., L.M., K.N., N.N., B.O., J.L.R., C.R., N.R., U.S., B.S.-S., C.T., V. Uhlmann and V. Ulman (in alphabetical order) provided comments and edited the manuscript.
Corresponding author
Ethics declarations
Competing interests
N.R. is an employee of and owns equity in scalable minds GmbH, which is a company that sells image analysis software and services. The remaining authors declare no competing interests.
Peer review
Peer review information
Nature Methods thanks Yingke Xu and Michele Darrow for their contribution to the peer review of this work. Primary Handling Editor: Rita Strack, in collaboration with the Nature Methods team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Box 1 and Supplementary Table 1
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zulueta-Coarasa, T., Jug, F., Mathur, A. et al. MIFA: Metadata, Incentives, Formats and Accessibility guidelines to improve the reuse of AI datasets for bioimage analysis. Nat Methods (2025). https://doi.org/10.1038/s41592-025-02835-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41592-025-02835-8