Abstract
Artificial intelligence (AI) has seen transformative breakthroughs in the life sciences, expanding possibilities to interpret biological information at an unprecedented capacity. To maximize return on growing investments and accelerate progress, it is urgent to address long-standing research challenges arising from the rapid adoption of AI methods. We review the erosion of trust in AI outputs driven by poor reusability and reproducibility, and highlight their impact on environmental sustainability. Furthermore, we discuss the fragmented components of the AI ecosystem and lack of guiding pathways to support open and sustainable AI model development. In response, this Perspective introduces practical open and sustainable AI recommendations mapped to over 300 ecosystem components and provides guiding implementation pathways. Our work connects researchers with relevant AI resources, facilitating the implementation of sustainable, reusable and reproducible AI. Built upon community consensus and aligned to existing efforts, these outputs will aid future policy development and structured pathways for guiding AI implementation.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout


Similar content being viewed by others
Code availability
All data and code are made available under a Creative Commons license CC BY 4.0: OSAI Ecosystem Interactive Website: https://osai.dome-ml.org/ai-ecosystem/; OSAI Ecosystem Data and Interactive Website Code - GitHub: https://github.com/BioComputingUP/OSAI_ecosystem/; GitHub—releases: https://github.com/BioComputingUP/OSAI_ecosystem/releases/; Zenodo—archived releases: https://doi.org/10.5281/zenodo.15391273 (ref. 99); Software Heritage—archived releases: https://archive.softwareheritage.org/browse/directory/d1205b81b070e43af1c5d3e1493287518c5262d7/?origin_url= https://doi.org/10.5281/zenodo.15391273&path=BioComputingUP-OSAI_ecosystem-f7a6068&release=3&snapshot=d46b21f9a6057d0ea05675b2c2eb27745f93c9df.
References
Walsh, I. et al. DOME: recommendations for supervised machine learning validation in biology. Nat. Methods 18, 1122–1127 (2021).
Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Luo, M. et al. Artificial intelligence for life sciences: a comprehensive guide and future trends. Innov. Life 2, 100105 (2024).
Paysan-Lafosse, T. et al. The Pfam protein families database: embracing AI/ML. Nucleic Acids Res. 53, D523–D534 (2025).
Varadi, M. et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 50, D439–D444 (2022).
Kapoor, S. & Narayanan, A. Leakage and the reproducibility crisis in machine-learning-based science. Patterns 4, 100804 (2023).
Clark, T. et al. AI-readiness for biomedical data: Bridge2AI recommendations. Preprint at bioRxiv https://doi.org/10.1101/2024.10.23.619844 (2024).
Tedersoo, L. et al. Data sharing practices and data availability upon request differ across scientific disciplines. Sci. Data 8, 192 (2021).
Laurinavichyute, A., Yadav, H. & Vasishth, S. Share the code, not just the data: a case study of the reproducibility of articles published in the Journal of Memory and Language under the open data policy. J. Mem. Lang. 125, 104332 (2022).
Alper, P. et al. RDMkit: A research data management toolkit for life sciences. Patterns 6, 101345 (2025).
Pistoia Alliance. The FAIR toolkit for life science industry. https://fairtoolkit.pistoiaalliance.org (2020).
Ouyang, W. et al. BioImage Model Zoo: a community-driven resource for accessible deep learning in bioimage analysis. Preprint at bioRxiv https://doi.org/10.1101/2022.06.07.495102 (2022).
Avsec, Ž et al. The Kipoi repository accelerates community exchange and reuse of predictive models for genomics. Nat. Biotechnol. 37, 592–600 (2019).
Akhtar, M. et al. Croissant: a metadata format for ML-ready datasets. In Proceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning (eds Hulsebos, M., Interlandi, M. & Shankar, S.) 1–6 (Association for Computing Machinery, 2024).
Research Data Alliance. RDA FAIR for Machine Learning (FAIR4ML) Interest Group. https://www.rd-alliance.org/groups/fair-machine-learning-fair4ml-ig/activity (2022).
Beam, A. L., Manrai, A. K. & Ghassemi, M. Challenges to the reproducibility of machine learning models in health care. JAMA 323, 305–306 (2020).
Unsal, S. et al. Learning functional properties of proteins with language models. Nat. Mach. Intell. 4, 227–245 (2022).
Sapkota, R., Roumeliotis, K. I. & Karkee, M. AI agents vs. agentic AI: A conceptual taxonomy, applications and challenges. Inf. Fusion 126, 103599 (2026).
Schwartz, R., Dodge, J., Smith, N. A. & Etzioni, O. Green AI. ACM 63, 54–63 (2020).
White, M. et al. The Model Openness Framework: promoting completeness and openness for reproducibility, transparency, and usability in artificial intelligence. Preprint at https://doi.org/10.48550/arXiv.2403.13784 (2024).
Lekadir, K. et al. FUTURE-AI: international consensus guideline for trustworthy and deployable artificial intelligence in healthcare. BMJ 388, e081554 (2025).
Kapoor, S. et al. REFORMS: consensus-based recommendations for machine-learning-based science. Sci. Adv. 10, eadk3452 (2024).
Machine Learning Commons. MLCommons: better AI for everyone. https://mlcommons.org (2025).
FAIR Advanced Research and Reproducibility (FARR) Research Coordination Network. FARR RCN. https://www.farr-rcn.org (2025).
Rai, A. Explainable AI: from black box to glass box. J. Acad. Mark. Sci. 48, 137–141 (2020).
Afroogh, S., Akbari, A., Malone, E., Kargar, M. & Alambeigi, H. Trust in AI: progress, challenges, and future directions. Humanit. Soc. Sci. Commun. 11, 1568 (2024).
Leslie, D. Understanding Artificial Intelligence Ethics and Safety: a Guide for the Responsible Design and Implementation of AI Systems in the Public Sector (The Alan Turing Institute, 2019).
Dignum, V. Responsible artificial intelligence: from principles to practice. Preprint at https://doi.org/10.48550/arXiv.2205.10785 (2022).
Ahdritz, G. et al. OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization. Nat. Methods 21, 1514–1524 (2024).
Collins, G. S. et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ 384, e078378 (2024).
Schmied, C. et al. Community-developed checklists for publishing images and image analyses. Nat. Methods 21, 170–181 (2024).
Liu, X. et al. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Nat. Med. 26, 1364–1374 (2020).
Cruz Rivera, S. et al. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. Nat. Med. 26, 1351–1363 (2020).
Kaggle. Kaggle: your machine learning and data science community. https://www.kaggle.com (2025).
Wolf, T. et al. HuggingFace’s Transformers: state-of-the-art natural language processing. Preprint at https://doi.org/10.48550/arXiv.1910.03771 (2019).
Turon, G., Legese, A., Arora, D. & Duran-Frigola, M. Ersilia Model Hub: a repository of AI/ML models for infectious and neglected tropical diseases. Zenodo https://doi.org/10.5281/ZENODO.7274645 (2025).
European Organization For Nuclear Research (CERN) & OpenAIRE. Zenodo https://doi.org/10.25495/7GXK-RD71 (2013).
Leo, S. et al. Recording provenance of workflow runs with RO-Crate. PLoS ONE 19, e0309210 (2024).
Huerta, E. A. et al. FAIR for AI: an interdisciplinary and international community building perspective. Sci. Data 10, 487 (2023).
Castro, L. J. et al. FAIR4ML-schema. Zenodo https://doi.org/10.5281/ZENODO.14002310 (2024).
Pistoia Alliance. Pistoia Alliance organisation website. https://www.pistoiaalliance.org (2025).
Open Data Institute. A framework for AI-ready data. https://theodi.hacdn.io/media/documents/A_framework_for_AI-ready_data.pdf (2025).
Scientific Computing World. Pistoia Alliance launches DataFAIRy to drive AI adoption. https://www.scientific-computing.com/news/pistoia-alliance-launches-datafairy-drive-ai-adoption (2024).
Desai, A., Abdelhamid, M. & Padalkar, N. R. What is reproducibility in artificial intelligence and machine learning research? AI Mag. 46, e70004 (2025).
Carter, R. E., Attia, Z. I., Lopez-Jimenez, F. & Friedman, P. A. Pragmatic considerations for fostering reproducible research in artificial intelligence. NPJ Digit. Med. 2, 42 (2019).
Tiwari, D. D. et al. BioModelsML: building a FAIR and reproducible collection of machine learning models in life sciences and medicine for easy reuse. Preprint at bioRxiv https://doi.org/10.1101/2023.05.22.540599 (2023).
Merkel, D. Docker: lightweight Linux containers for consistent development and deployment. Linux J. 2014, 2 (2014).
Anaconda. Conda https://anaconda.org/anaconda/conda (2025).
Di Tommaso, P. et al. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 35, 316–319 (2017).
Köster, J. & Rahmann, S. Snakemake: a scalable bioinformatics workflow engine. Bioinformatics 28, 2520–2522 (2012).
Galaxy Community, T. he et al. The Galaxy platform for accessible, reproducible, and collaborative data analyses: 2024 update. Nucleic Acids Res. 52, W83–W94 (2024).
Heil, B. J. et al. Reproducibility standards for machine learning in the life sciences. Nat. Methods 18, 1132–1135 (2021).
Bisong, E. Google Colaboratory. In Building Machine Learning and Deep Learning Models on Google Cloud Platform Ch. 7, 59–64 (Apress, 2019).
Anthony, L. F. W., Kanding, B. & Selvan, R. Carbontracker: tracking and predicting the carbon footprint of training deep learning models. Preprint at https://doi.org/10.48550/arXiv.2007.03051 (2020).
Ritchie, H. et al. Hardware and energy cost to train notable AI systems. Our World in Data https://ourworldindata.org/grapher/hardware-and-energy-cost-to-train-notable-ai-systems (2023).
Gailhofer, P. et al. The Role of Artificial Intelligence in the European Green Deal (European Parliament, 2023).
Bolón-Canedo, V. et al. A review of green artificial intelligence: towards a more sustainable future. Neurocomputing 599, 128096 (2024).
EMBL. Sustainability: reports and resources. https://www.embl.org/about/info/sustainability/reports-resources (2025).
Yamada, T. et al. Frugal machine learning: making AI more efficient, accessible, and sustainable. Preprint at https://doi.org/10.36227/techrxiv.173385981.11102720/v1 (2024).
Tornede, T. et al. Towards green automated machine learning: status quo and future directions. J. Artif. Intell. Res. 77, 427–457 (2023).
Johnson, S. G., Simon, G. & Aliferis, C. Regulatory aspects and ethical legal societal implications (ELSI). In Artificial Intelligence and Machine Learning in Health Care and Medical Sciences (eds Simon, G. J. & Aliferis, C.) Ch. 16, 659–692 (Springer, 2024).
Jefferson, E. et al. GRAIMatter: guidelines and resources for AI model access from TrusTEd research environments (GRAIMatter). Int. J. Popul. Data Sci. 7, 2005 (2022).
European Commission. AI for Health: evaluation of applications & datasets (AHEAD). CORDIS https://cordis.europa.eu/project/id/101183031 (2024).
European Commission. HORIZON Europe: ELIXIR-STEERS project. CORDIS https://cordis.europa.eu/project/id/101131096 (2024).
SustAInML. Sustainable AI and Machine Learning. https://sustainml.eu (2021).
Software Sustainability Institute. Green DiSC: a digital sustainability certification. https://www.software.ac.uk/GreenDiSC (2025).
Geoscience and Remote Sensing Society (GRSS). GeoCroissant: a metadata framework for geospatial ML-ready datasets. https://www.grss-ieee.org/events/geocroissant-a-metadata-framework-for-geospatial-ml-ready-datasets (2024).
Mitchell, M. et al. Model cards for model reporting. In Proceedings of the 2019 Conference on Fairness, Accountability, and Transparency (eds Friedler, S. A. & Wilson, C.) 220–229 (Association for Computing Machinery, 2019).
Pushkarna, M., Zaldivar, A. & Kjartansson, O. Data cards: purposeful and transparent dataset documentation for responsible AI. Preprint at https://doi.org/10.48550/ARXIV.2204.01075 (2022).
Dasoulas, I., Yang, D. & Dimou, A. MLSea: a semantic layer for discoverable machine learning. In The Semantic Web (eds Meroño Peñuela, A. et al.) Ch. 11, 178–198 (Springer, 2024).
SciLifeLab Data Centre. SciLifeLab: funder requirements and FAIR ML models. https://serve.scilifelab.se/docs/model-serving/fair (2025).
Van Geest, G. et al. Using Glittr.org to find, compare and re-use online materials for training and education. PLoS ONE 19, e0308729 (2024).
Data Carpentry. Data Carpentry lessons. https://datacarpentry.org/lessons (2025).
The Turing Way Community. The Turing way: a handbook for reproducible, ethical and collaborative research. Zenodo https://doi.org/10.5281/ZENODO.15213042 (2025).
ONNX. ONNX: Open Neural Network Exchange. https://onnx.ai/ (2025).
Attafi, O. A. et al. DOME registry: implementing community-wide recommendations for reporting supervised machine learning in biology. GigaScience 13, giae094 (2024).
Kurtzer, G. M., Sochat, V. & Bauer, M. W. Singularity: scientific containers for mobility of compute. PLoS ONE 12, e0177459 (2017).
Docker. Docker Hub container image library. https://hub.docker.com (2025).
Yuen, D. et al. The Dockstore: enhancing a community platform for sharing reproducible and accessible computational protocols. Nucleic Acids Res. 49, W624–W632 (2021).
Clyburne-Sherin, A., Fei, X. & Green, S. A. Computational reproducibility via containers in psychology. Meta Psychol. 3, 892 (2019).
Kryshtafovych, A. et al. Critical assessment of methods of protein structure prediction (CASP): round XV. Proteins 91, 1539–1549 (2023).
Xiong, Z. et al. Crowdsourced identification of multi-target kinase inhibitors for RET- and TAU- based disease: the Multi-Targeting Drug DREAM Challenge. PLoS Comput. Biol. 17, e1009302 (2021).
Capella-Gutierrez, S. et al. Lessons learned: recommendations for establishing critical periodic scientific benchmarking. Preprint at bioRxiv https://doi.org/10.1101/181677 (2017).
Ash, J. T. & Adams, R. P. On warm-starting neural network training. In Advances in Neural Information Processing Systems 33 (eds Larochelle, H. et al.) 3884–3894 (Curran Associates, 2020).
Tmamna, J. et al. Pruning deep neural networks for green energy-efficient models: a survey. Cogn. Comput. 16, 2931–2952 (2024).
Krishnan, S. & Faust, A. Quantization for fast and environmentally sustainable reinforcement learning. Google Research Blog https://research.google/blog/quantization-for-fast-and-environmentally-sustainable-reinforcement-learning (2021).
Yuan, Y. et al. The impact of knowledge distillation on the energy consumption and runtime efficiency of NLP models. In Proceedings of the IEEE/ACM 3rd International Conference on AI Engineering – Software Engineering for AI (eds Cleland-Huang, J., Bosch, J., Muccini, H. & Lewis, G. A.) 129–133 (Association for Computing Machinery, 2024).
Tabbakh, A. et al. Towards sustainable AI: a comprehensive framework for Green AI. Discov. Sustain. 5, 408 (2024).
Guo, D. et al. DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning. Nature 645, 633–638 (2025).
Green Software Foundation. Green software patterns. https://patterns.greensoftware.foundation (2025).
Green Software Foundation. Green Software Foundation. https://greensoftware.foundation (2025).
TOP500.org. Green500 List: November 2023. https://top500.org/lists/green500/2023/11 (2023).
Performance Optimisation and Productivity Centre of Excellence in HPC. https://pop-coe.eu (2025).
Schmidt, V. et al. Machine learning CO2 impact calculator. https://mlco2.github.io/impact (2025).
GitHub. Official Repository of MICCAI FLARE Challenges. https://github.com/JunMa11/FLARE (2025).
Henderson, P. et al. Towards the systematic reporting of the energy and carbon footprints of machine learning. J. Mach. Learn. Res. 21, 10039–10081 (2020).
Ravi, N. et al. FAIR principles for AI models with a practical application for accelerated high energy diffraction microscopy. Sci. Data 9, 657 (2022).
Farrell, G. OSAI ecosystem components data. Zenodo https://doi.org/10.5281/zenodo.15391273 (2025).
RSQKit Community. Research software quality kit (RSQKit). Zenodo https://doi.org/10.5281/zenodo.14923572 (2025).
Gavriilidis, G. I. et al. APNet, an explainable sparse deep learning model to discover differentially active drivers of severe COVID-19. Bioinformatics 41, btaf063 (2025).
D’Anna, F. et al. A research data management (RDM) community for ELIXIR. F1000Res. 13, 230 (2024).
BY-COVID. Infectious Diseases Toolkit (IDTk). https://www.infectious-diseases-toolkit.org (2025).
Mungall, C. Open knowledge bases in the age of generative AI. F1000Res. https://doi.org/10.7490/F1000RESEARCH.1120248.1 (2025).
Yiyao, L. et al. OmicsNavigator: an LLM-driven multi-agent system for autonomous zero-shot biological analysis in spatial omics. Preprint at bioRxiv https://doi.org/10.1101/2025.07.21.665821 (2025).
Huang, K. et al. Biomni: a general-purpose biomedical AI agent. Preprint at bioRxiv https://doi.org/10.1101/2025.05.30.656746 (2025).
Wei, J. et al. From AI for science to agentic science: a survey on autonomous scientific discovery. Preprint at https://doi.org/10.48550/arXiv.2508.14111 (2025).
Kim, J. et al. The cost of dynamic reasoning: demystifying AI agents and test-time scaling from an AI infrastructure perspective. Preprint at https://doi.org/10.48550/arXiv.2506.04301 (2025).
European Commission. The EU AI Act. https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai (2024).
National Science Foundation. National Artificial Intelligence Research Resource (NAIRR) pilot. https://www.nsf.gov/focus-areas/artificial-intelligence/nairr (2024).
The White House. America’s AI action plan. https://www.whitehouse.gov/wp-content/uploads/2025/07/Americas-AI-Action-Plan.pdf (2025).
Declaration on Research Assessment (DORA). https://sfdora.org/about-dora (2025).
CoARA. Coalition for Advancing Research Assessment. https://coara.org (2025).
Wang, Y. et al. SimpleFold: folding proteins is simpler than you think. Preprint at https://doi.org/10.48550/arXiv.2509.18480 (2025).
Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).
AlphaFold3: why did Nature publish it without its code? Nature 629, 728 (2024).
Callaway, E. AI protein-prediction tool AlphaFold3 is now more open. Nature 635, 531–532 (2024).
Global Alliance for Genomics & Health (GA4GH). https://www.ga4gh.org (2025).
Pascucci, E. et al. Progressing towards personalised medicine: the Genomic Data Infrastructure (GDI) project. Eur. J. Public Health 34, ckae144.1956 (2024).
Heredia, I. et al. AI4EOSC: a federated cloud platform for artificial intelligence in scientific research. Preprint at https://arxiv.org/abs/2512.16455 (2025).
Acknowledgements
This work has been supported by ELIXIR, the European infrastructure for life science data. Additional funding was from European Union through NextGenerationEU PNRR project ELIXIRxNextGenIT (grant agreement no. IR0000010, to S.C.E.T.); Horizon Europe projects EVERSE (grant agreement no. 101129744, to S.C.E.T., C.G., T.V. and F.P.) and ELIXIR STEERS (grant agreement no. 101131096, to S.C.E.T., C.G., T.V. and F.P.); and COST Action ML4NGP (grant agreement no. CA21160, to S.C.E.T.), supported by COST (European Cooperation in Science and Technology. Views and opinions expressed are those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.
Author information
Authors and Affiliations
Contributions
All authors contributed to the OSAI ecosystem and subsequent discussion as well as the initial draft of this manuscript. G.F. wrote the final draft with the help of the co-authors. All authors edited and refined the final manuscript. G.F., F.P. and S.C.E.T. initiated and coordinated the project.
Corresponding authors
Ethics declarations
Competing interests
R.F.-D. is an Industrial PhD candidate at University College Dublin with affiliation to IBM Research Dublin, who supports joint PhD research between both sites. L.C. is the director of SequenceAnalysis.co.uk. These affiliations are commercial in nature; however, the authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. All other authors declare no competing interests.
Peer review
Peer review information
Nature Methods thanks Jian Ma, Benjamin Haibe-Kainsand the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Lin Tang, in collaboration with the Nature Methods team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Farrell, G., Adamidi, E., Andrade Buono, R. et al. Open and sustainable AI: challenges, opportunities and the road ahead in the life sciences. Nat Methods (2026). https://doi.org/10.1038/s41592-026-03037-6
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41592-026-03037-6


