Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Perspective
  • Published:

MIFA: Metadata, Incentives, Formats and Accessibility guidelines to improve the reuse of AI datasets for bioimage analysis

Abstract

Artificial intelligence (AI) methods are powerful tools for biological image analysis and processing. High-quality annotated images are key to training and developing new algorithms, but access to such data is often hindered by the lack of standards for sharing datasets. We discuss the barriers to sharing annotated image datasets and suggest specific guidelines to improve the reuse of bioimages and annotations for AI applications. These include standards on data formats, metadata, data presentation and sharing, and incentives to generate new datasets. We are sure that the Metadata, Incentives, Formats and Accessibility (MIFA) recommendations will accelerate the development of AI tools for bioimage analysis by facilitating access to high-quality training and benchmarking data.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Diverse annotation types belonging to AI-ready datasets stored at the BioImage Archive and EMPIAR.
Fig. 2: Metadata modules for AI image datasets.
Fig. 3: MIFA recommendations for FAIR AI data sharing.

Similar content being viewed by others

Data availability

Data availability is not applicable to this article as no new data were created or analyzed for this work. All used images are already publicly accessible, permissibly licensed and referenced by identifier.

References

  1. Zhang, K., Pintilie, G. D., Li, S., Schmid, M. F. & Chiu, W. Resolving individual atoms of protein complex by cryo-electron microscopy. Cell Res. 30, 1136–1139 (2020).

    Article  PubMed  Google Scholar 

  2. Nakane, T. et al. Single-particle cryo-EM at atomic resolution. Nature 587, 152–156 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Yip, K. M., Fischer, N., Paknia, E., Chari, A. & Stark, H. Atomic-resolution protein structure determination by cryo-EM. Nature 587, 157–161 (2020).

    Article  CAS  PubMed  Google Scholar 

  4. Betzig, E. et al. Imaging intracellular fluorescent proteins at nanometer resolution. Science 313, 1642–1645 (2006).

    Article  CAS  PubMed  Google Scholar 

  5. Rust, M. J., Bates, M. & Zhuang, X. Sub-diffraction-limit imaging by stochastic optical reconstruction microscopy (STORM). Nat. Methods 3, 793–795 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Hess, S. T., Girirajan, T. P. K. & Mason, M. D. Ultra-high resolution imaging by fluorescence photoactivation localization microscopy. Biophys. J. 91, 4258–4272 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Megason, S. G. In toto imaging of embryogenesis with confocal time-lapse microscopy. Methods Mol. Biol. 546, 317–332 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  8. McDole, K. et al. In toto imaging and reconstruction of post-implantation mouse development at the single-cell level. Cell 175, 859–876 (2018).

    Article  CAS  PubMed  Google Scholar 

  9. Daetwyler, S., Günther, U., Modes, C. D., Harrington, K. & Huisken, J. Multi-sample SPIM image acquisition, processing and analysis of vascular growth in zebrafish. Development 146, dev173757 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Chen, B. -C. et al. Lattice light-sheet microscopy: imaging molecules to embryos at high spatiotemporal resolution. Science 346, 1257998 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Huisken, J., Swoger, J., Del Bene, F., Wittbrodt, J. & Stelzer, E. H. K. Optical sectioning deep inside live embryos by selective plane illumination microscopy. Science 305, 1007–1009 (2004).

    Article  CAS  PubMed  Google Scholar 

  12. Udan, R. S., Piazza, V. G., Hsu, C. -W., Hadjantonakis, A. -K. & Dickinson, M. E. Quantitative imaging of cell dynamics in mouse embryos using light-sheet microscopy. Development 141, 4406–4414 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Royer, L. A. et al. Adaptive light-sheet microscopy for long-term, high-resolution imaging in living organisms. Nat. Biotechnol. 34, 1267–1278 (2016).

    Article  CAS  PubMed  Google Scholar 

  14. Moen, E. et al. Deep learning for cellular image analysis. Nat. Methods 16, 1233–1246 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Hallou, A., Yevick, H. G., Dumitrascu, B. & Uhlmann, V. Deep learning for bioimage analysis in developmental biology. Development 148, dev199616 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Gupta, A. et al. Deep learning in image cytometry: a review. Cytometry A 95, 366–380 (2019).

    Article  PubMed  Google Scholar 

  17. Villoutreix, P. What machine learning can do for developmental biology. Development 148, dev188474 (2021).

    Article  CAS  PubMed  Google Scholar 

  18. Wang, S., Yang, D. M., Rong, R., Zhan, X. & Xiao, G. Pathology image analysis using segmentation deep learning algorithms. Am. J. Pathol. 189, 1686–1698 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  19. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

    Article  CAS  PubMed  Google Scholar 

  20. Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016). This publication introduces the FAIR principles, explains their rationale and highlights example implementations.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Rutschi, C., Berente, N. & Nwanganga, F. Data sensitivity and domain specificity in reuse of machine learning applications. Inf. Syst. Front. https://doi.org/10.1007/s10796-023-10388-4 (2023).

    Article  Google Scholar 

  22. Laine, R. F., Arganda-Carreras, I., Henriques, R. & Jacquemet, G. Avoiding a replication crisis in deep-learning-based bioimage analysis. Nat. Methods 18, 1136–1144 (2021). This Comment highlights important considerations for researchers to ensure reproducibility when publishing studies using deep learning in microscopy, including validation methods, tool selection, data practices and reporting standards.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Boehm, U. et al. QUAREP-LiMi: a community endeavor to advance quality assessment and reproducibility in light microscopy. Nat. Methods 18, 1423–1426 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Swedlow, J. R. et al. A global view of standards for open image data formats and repositories. Nat. Methods 18, 1440–1446 (2021).

    Article  CAS  PubMed  Google Scholar 

  25. Linkert, M. et al. Metadata matters: access to image data in the real world. J. Cell Biol. 189, 777–782 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Sarkans, U. et al. REMBI: REcommended Metadata for Biological Images—enabling reuse of microscopy data in biology. Nat. Methods 18, 1418–1422 (2021). This article introduces the REMBI guidelines aimed to maximize the reuse of biological images across diverse imaging communities.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Schapiro, D. et al. MITI minimum information guidelines for highly multiplexed tissue images. Nat. Methods 19, 262–267 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Schwendy, M., Unger, R. E. & Parekh, S. H. EVICAN-a balanced dataset for algorithm development in cell and nucleus segmentation. Bioinformatics 36, 3863–3870 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Stringer, C., Wang, T., Michaelos, M. & Pachitariu, M. Cellpose: a generalist algorithm for cellular segmentation. Nat. Methods 18, 100–106 (2021).

    Article  CAS  PubMed  Google Scholar 

  30. Edlund, C. et al. LIVECell—a large-scale dataset for label-free live cell segmentation. Nat. Methods 18, 1038–1045 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Conrad, R. & Narayan, K. CEM500K, a large-scale heterogeneous unlabeled cellular electron microscopy image dataset for deep learning. Elife 10, e65894 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Conrad, R. & Narayan, K. Instance segmentation of mitochondria in electron microscopy images with a generalist deep learning model trained on a diverse dataset. Cell Syst. 14, 58–71 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Caicedo, J. C. et al. Nucleus segmentation across imaging experiments: the 2018 Data Science Bowl. Nat. Methods 16, 1247–1253 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Ulman, V. et al. An objective comparison of cell-tracking algorithms. Nat. Methods 14, 1141–1152 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Maška, M. et al. The Cell Tracking Challenge: 10 years of objective benchmarking. Nat. Methods https://doi.org/10.1038/s41592-023-01879-y (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  36. Ljosa, V., Sokolnicki, K. L. & Carpenter, A. E. Annotated high-throughput microscopy image sets for validation. Nat. Methods 9, 637 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Iudin, A. et al. EMPIAR: the Electron Microscopy Public Image Archive. Nucleic Acids Res. 51, D1503–D1511 (2023).

    Article  PubMed  Google Scholar 

  38. Hartley, M. et al. The BioImage Archive—building a home for life-sciences microscopy data. J. Mol. Biol. 434, 167505 (2022).

    Article  CAS  PubMed  Google Scholar 

  39. Bard, J. B. L. & Rhee, S. Y. Ontologies in biology: design, applications and future challenges. Nat. Rev. Genet. 5, 213–222 (2004).

    Article  CAS  PubMed  Google Scholar 

  40. Creative Commons—CC0. Creative Commons https://creativecommons.org/share-your-work/public-domain/cc0/ (2009).

  41. Creative Commons—Attribution 4.0 International—CC BY 4.0. https://creativecommons.org/licenses/by/4.0/

  42. Ouyang, W. et al. BioImage Model Zoo: a community-driven resource for accessible deep learning in bioimage analysis. Preprint at bioRxiv https://doi.org/10.1101/2022.06.07.495102 (2022).

  43. Virshup, I., Rybakov, S., Theis, F. J., Angerer, P. & Alexander Wolf, F. anndata: access and store annotated data matrices. J. Open Source Softw. 9, 4371 (2024).

    Article  Google Scholar 

  44. Redmon, J., Divvala, S., Girshick, R. & Farhadi, A. You only look once: unified, real-time object detection. in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) https://doi.org/10.1109/cvpr.2016.91 (IEEE, 2016).

  45. Stelzer, E. H. K. et al. Light sheet fluorescence microscopy. Nat. Rev. Methods Primers 1, 1–25 (2021).

    Article  Google Scholar 

  46. Peddie, C. J. et al. Volume electron microscopy. Nat. Rev. Methods Primers 2, 1–23 (2022).

    Google Scholar 

  47. Moore, J. et al. OME-Zarr: a cloud-optimized bioimaging file format with international community support. Histochem. Cell Biol. 160, 223–251 (2023). This work introduces the cloud-optimized file format OME-Zarr, which aims to improve FAIR data access and unify file standards across fields to support efficient data management and analysis.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Marconato, L. et al. SpatialData: an open and universal data framework for spatial omics. Nat. Methods https://doi.org/10.1038/s41592-024-02212-x (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  49. Butler, H. et al. The GeoJSON Format, RFC 7946. https://doi.org/10.17487/rfc7946 (2016).

  50. Lin, T. -Y. et al. Microsoft COCO: Common Objects in Context. Preprint at https://arxiv.org/abs/1405.0312 (2014).

  51. Data sharing is the future. Nat. Methods 20, 471 (2023).

  52. Kaiser, J. & Brainard, J. Ready, set, share! Science 379, 322–325 (2023).

    Article  PubMed  Google Scholar 

  53. Sever, R. We need a plan D. Nat. Methods 20, 473–474 (2023).

    Article  CAS  PubMed  Google Scholar 

  54. Uhlmann, V., Hartley, M., Moore, J., Weisbart, E. & Zaritsky, A. Making the most of bioimaging data through interdisciplinary interactions. J. Cell Sci. 137, jcs262139 (2024). This article examines key players in the bioimaging field, highlights barriers to interdisciplinary interaction and proposes actions to foster a culture of open data sharing to drive innovation.

    Article  PubMed  PubMed Central  Google Scholar 

  55. Jing, L. & Tian, Y. Self-supervised visual feature learning with deep neural networks: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 43, 4037–4058 (2021).

    Article  PubMed  Google Scholar 

  56. Bekkhus, T. et al. Remodeling of the lymph node high endothelial venules reflects tumor invasiveness in breast cancer and is associated with dysregulation of perivascular stromal cells. Cancers 13, 211 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Rangan, R. et al. CryoDRGN-ET: deep reconstructing generative networks for visualizing dynamic biomolecules inside cells. Nat. Methods 21, 1537–1545 (2024).

    Article  CAS  PubMed  Google Scholar 

  58. Galimov, E. & Yakimovich, A. A tandem segmentation-classification approach for the localization of morphological predictors of lifespan and motility. Aging 14, 1665–1677 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  59. Vijayan, A. et al. The annotation and analysis of complex 3D plant organs using 3DCoordX. Plant Physiol. 189, 1278–1295 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Jones, R. A., Renshaw, M. J., Barry, D. J. & Smith, J. C. Automated staging of zebrafish embryos using machine learning. Wellcome Open Res. 7, 275 (2022).

    Article  PubMed  Google Scholar 

  61. Rappez, L., Rakhlin, A., Rigopoulos, A., Nikolenko, S. & Alexandrov, T. DeepCycle reconstructs a cyclic cell cycle trajectory from unsegmented cell images using convolutional neural networks. Mol. Syst. Biol. 16, e9474 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  62. Kromp, F. et al. An annotated fluorescence image dataset for training nuclear segmentation methods. Sci. Data 7, 262 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

To improve support for image annotations of AI-related datasets and to develop annotation standards for the community, a workshop was held with 45 community experts from various backgrounds, including data generators, annotators, curators, AI researchers, bioimage analysts and software developers. The workshop sessions resulted in a series of recommendations on four main topics: Metadata, Incentives, Formats and Accessibility (MIFA), which are described above. We are grateful to the FAIR AI workshop participants F. Ballllosera, A. Bhardwaj, J. -M. Burel, A. French, M. Hammer, D. Hensen, K. Ho, S. Jasek, I. Kemmer, J. Kriel, A. Iudin, W. Ouyang, A. Papaleo, A. Rupaningal, C. Strambio De Castillia, B. Wester, S. Weyand and G. Zaki for their insightful inputs and valuable contributions to the discussion.

The workshop was organized in the framework of the AI4Life project, which has received funding from the European Union’s Horizon Europe Research and Innovation Programme under grant agreement no. 101057970. Views and opinions expressed are, however, those of the authors only and do not necessarily reflect those of the European Union. Neither the European Union nor the granting authority can be held responsible for them. This project has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under contract no. 75N91019D00024. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products or organizations imply endorsement by the US Government.

J.M. is supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation; 501864659 as part of NFDI4BIOIMAGE). M.L.J. is supported by the Francis Crick Institute, which receives its core funding from Cancer Research UK (CC1076), the UK Medical Research Council (CC1076) and the Wellcome Trust (CC1076). G.J.K. and P.K. were supported by EMBL-EBI and the Wellcome Trust (221371/Z/20/Z). N.N. acknowledges support from the Swedish Research Council (2023-05450), Sigurd and Elsa Goljes Memorial Foundation, IngaBritt och Arne Lundberg foundation, Magnus Bergvall Foundation, and Greta and Johan Kock’s foundations. V. Ulman was supported by the Ministry of Education, Youth and Sports of the Czech Republic through the e-INFRA CZ (90254). C.T. and V. Ulman are supported by grant nos. 2020-225265 and 2024-342803, respectively, from the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation. P.P.-G. is a member of the national infrastructure France-Bioimaging supported by the French national research agency (ANR-24-INBS-0005 FBI BIOGEN). A.M.-B. is supported by Ministerio de Ciencia, Innovación y Universidades, Agencia Estatal de Investigación (MCIN/AEI/10.13039/501100011033/), under grant PID2023-152631OB-I00.

Author information

Authors and Affiliations

Authors

Contributions

All authors attended the workshop and participated in the plenary and breakout room discussions. T.Z.-C. and M.H. led the writing process. F.J., A.M., J.M. and A.M.-B. (in alphabetical order) acted as chairs of the breakout rooms and summarized discussions. A.M., J.M., A.M.-B., L.A., K.B., P.B., P.G., N.G., M.L.J., G.J.K., P.K., A.K., A.K.Y., L.M., K.N., N.N., B.O., J.L.R., C.R., N.R., U.S., B.S.-S., C.T., V. Uhlmann and V. Ulman (in alphabetical order) provided comments and edited the manuscript.

Corresponding author

Correspondence to Matthew Hartley.

Ethics declarations

Competing interests

N.R. is an employee of and owns equity in scalable minds GmbH, which is a company that sells image analysis software and services. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Yingke Xu and Michele Darrow for their contribution to the peer review of this work. Primary Handling Editor: Rita Strack, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Box 1 and Supplementary Table 1

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zulueta-Coarasa, T., Jug, F., Mathur, A. et al. MIFA: Metadata, Incentives, Formats and Accessibility guidelines to improve the reuse of AI datasets for bioimage analysis. Nat Methods (2025). https://doi.org/10.1038/s41592-025-02835-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41592-025-02835-8

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing