Applying interpretable machine learning in computational biology—pitfalls, recommendations and opportunities for new developments

Chen, Valerie; Yang, Muyu; Cui, Wenbo; Kim, Joon Sik; Talwalkar, Ameet; Ma, Jian

doi:10.1038/s41592-024-02359-7

Perspective
Published: 09 August 2024

Applying interpretable machine learning in computational biology—pitfalls, recommendations and opportunities for new developments

Nature Methods volume 21, pages 1454–1461 (2024)Cite this article

13k Accesses
57 Citations
92 Altmetric
Metrics details

Subjects

Abstract

Recent advances in machine learning have enabled the development of next-generation predictive models for complex computational biology problems, thereby spurring the use of interpretable machine learning (IML) to unveil biological insights. However, guidelines for using IML in computational biology are generally underdeveloped. We provide an overview of IML methods and evaluation techniques and discuss common pitfalls encountered when applying IML methods to computational biology problems. We also highlight open questions, especially in the era of large language models, and call for collaboration between IML and computational biology researchers.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: The two main IML approaches used to explain prediction models are post hoc explanations and by-design explanations.**

**Fig. 2: Two common evaluation techniques for IML methods.**

**Fig. 3: An overview of three common pitfalls of IML interpretation in biological contexts and how to avoid these pitfalls.**

A guide to machine learning for biologists

Article 13 September 2021

Learning meaningful representations of protein sequences

Article Open access 08 April 2022

Navigating the pitfalls of applying machine learning in genomics

Article 26 November 2021

References

Miller, T. Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38 (2019).
Article Google Scholar
Doshi-Velez, F. & Kim, B. Towards a rigorous science of interpretable machine learning. Preprint at https://arxiv.org/abs/1702.08608 (2017).
Azodi, C. B., Tang, J. & Shiu, S. -H. Opening the black box: interpretable machine learning for geneticists. Trends Genet. 36, 442–455 (2020).
Article CAS PubMed Google Scholar
Eraslan, G., Avsec, Ž., Gagneur, J. & Theis, F. J. Deep learning: new computational modelling techniques for genomics. Nat. Rev. Genet. 20, 389–403 (2019). This paper gives an extensive review of the application of deep learning models in genomics.
Article CAS PubMed Google Scholar
Talukder, A., Barham, C., Li, X. & Hu, H. Interpretation of deep learning in genomics and epigenomics. Brief. Bioinform. 22, bbaa177 (2021).
Article PubMed Google Scholar
Novakovsky, G., Dexter, N., Libbrecht, M. W., Wasserman, W. W. & Mostafavi, S. Obtaining genetics insights from deep learning via explainable artificial intelligence. Nat. Rev. Genet. 24, 125–137 (2023). This paper provides a comprehensive review for the commonly applied IML methods in biology through the examples from regulatory genomics.
Article CAS PubMed Google Scholar
Klauschen, F. et al. Toward explainable artificial intelligence for precision pathology. Annu. Rev. Pathol. 19, 541–570 (2024).
Article CAS PubMed Google Scholar
Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021).
Article CAS PubMed PubMed Central Google Scholar
Avsec, Ž. et al. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat. Genet. 53, 354–366 (2021). This paper is a representative example of applying post hoc explanation methods and connecting the feature importance scores with biological interpretations.
Article CAS PubMed PubMed Central Google Scholar
Schwessinger, R. et al. DeepC: predicting 3D genome folding using megabase-scale transfer learning. Nat. Methods 17, 1118–1124 (2020).
Article CAS PubMed PubMed Central Google Scholar
Karimi, M., Wu, D., Wang, Z. & Shen, Y. DeepAffinity: interpretable deep learning of compound–protein affinity through unified recurrent and convolutional neural networks. Bioinformatics 35, 3329–3338 (2019).
Article CAS PubMed PubMed Central Google Scholar
Vig, J. et al. BERTology meets biology: interpreting attention in protein language models. In International Conference on Learning Representations (ICLR, 2021).
Taujale, R. et al. Mapping the glycosyltransferase fold landscape using interpretable deep learning. Nat. Commun. 12, 5656 (2021).
Article CAS PubMed PubMed Central Google Scholar
Ma, J. et al. Using deep learning to model the hierarchical structure and function of a cell. Nat. Methods 15, 290–298 (2018). This paper illustrates a biologically informed neural network that incorporates the hierarchical cell subsystems into the neural network architecture.
Article CAS PubMed PubMed Central Google Scholar
Tasaki, S., Gaiteri, C., Mostafavi, S. & Wang, Y. Deep learning decodes the principles of differential gene expression. Nat. Mach. Intell. 2, 376–386 (2020).
Article PubMed PubMed Central Google Scholar
Tao, Y. et al. Interpretable deep learning for chromatin-informed inference of transcriptional programs driven by somatic alterations across cancers. Nucleic Acids Res. 50, 10869–10881 (2022).
Article CAS PubMed PubMed Central Google Scholar
Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023).
Article CAS PubMed PubMed Central Google Scholar
Barnett, A. J. et al. A case-based interpretable deep learning model for classification of mass lesions in digital mammography. Nat. Mach. Intell. 3, 1061–1070 (2021).
Article Google Scholar
Zaritsky, A. et al. Interpretable deep learning uncovers cellular properties in label-free live cell images that are predictive of highly metastatic melanoma. Cell Syst. 12, 733–747 (2021).
Article CAS PubMed PubMed Central Google Scholar
DeGrave, A. J., Cai, Z. R., Janizek, J. D., Daneshjou, R. & Lee, S. -I. Auditing the inference processes of medical-image classifiers by leveraging generative AI and the expertise of physicians. Nat. Biomed. Eng. https://doi.org/10.1038/s41551-023-01160-9 (2023).
Heil, B. J. et al. Reproducibility standards for machine learning in the life sciences. Nat. Methods 18, 1132–1135 (2021).
Article CAS PubMed PubMed Central Google Scholar
Whalen, S., Schreiber, J., Noble, W. S. & Pollard, K. S. Navigating the pitfalls of applying machine learning in genomics. Nat. Rev. Genet. 23, 169–181 (2022).
Article CAS PubMed Google Scholar
Sapoval, N. et al. Current progress and open challenges for applying deep learning across the biosciences. Nat. Commun. 13, 1728 (2022).
Article CAS PubMed PubMed Central Google Scholar
Chen, V., Li, J., Kim, J. S., Plumb, G. & Talwalkar, A. Interpretable machine learning: moving from mythos to diagnostics. Commun. ACM 65, 43–50 (2022). This paper describes the disconnect between IML techniques and downstream use cases and outlines paths forward to bridge the disconnect.
Article Google Scholar
Räuker, T., Ho, A., Casper, S. & Hadfield-Menell, D. Toward transparent AI: a survey on interpreting the inner structures of deep neural networks. In 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 464–483 (IEEE, 2023).
Yang, M. & Ma, J. Machine learning methods for exploring sequence determinants of 3D genome organization. J. Mol. Biol. 434, 167666 (2022).
Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. In International Conference on Machine Learning, 3145–3153 (PMLR, 2017).
Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. In International Conference on Machine Learning, 3319–3328 (PMLR, 2017).
Selvaraju, R. R. et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, 618–626 (2017).
Nair, S., Shrikumar, A., Schreiber, J. & Kundaje, A. fastISM: performant in silico saturation mutagenesis for convolutional neural networks. Bioinformatics 38, 2397–2403 (2022).
Article CAS PubMed PubMed Central Google Scholar
Lundberg, S. M. & Lee, S. -I. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, 4768–4777 (2017).
Ribeiro, M. T., Singh, S. & Guestrin, C. “Why should I trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD, 1135–1144 (2016).
Tseng, A., Shrikumar, A. & Kundaje, A. Fourier-transform-based attribution priors improve the interpretability and stability of deep learning models for genomics. Adv. Neural Inf. Process. Syst. 33, 1913–1923 (2020).
Google Scholar
Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215 (2019).
Hastie, T. & Tibshirani, R. Generalized additive models: some applications. J. Am. Stat. Assoc. 82, 371–386 (1987).
Article Google Scholar
Elmarakeby, H. A. et al. Biologically informed deep neural network for prostate cancer discovery. Nature 598, 348–352 (2021).
Article CAS PubMed PubMed Central Google Scholar
Fortelny, N. & Bock, C. Knowledge-primed neural networks enable biologically interpretable deep learning on single-cell sequencing data. Genome Biol. 21, 190 (2020).
Article PubMed PubMed Central Google Scholar
Janizek, J. D. et al. PAUSE: principled feature attribution for unsupervised gene expression analysis. Genome Biol. 24, 81 (2023). This paper proposes an approach to combining the post hoc and by-design explanation methods.
Article PubMed PubMed Central Google Scholar
Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems, 5998–6008 (2017).
Karbalayghareh, A., Sahin, M. & Leslie, C. S. Chromatin interaction–aware gene regulatory modeling with graph attention networks. Genome Res. 32, 930–944 (2022).
PubMed PubMed Central Google Scholar
Serrano, S. & Smith, N. A. Is attention interpretable? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2931–2951 (2019).
Jain, S. & Wallace, B. C. Attention is not explanation. In Proceedings of NAACL-HLT, 3543–3556 (2019).
Wiegreffe, S. & Pinter, Y. Attention is not not explanation. In Proceedings of EMNLP-IJCNLP, 11–20 (2019).
Bai, B. et al. Why attentions may not be interpretable? In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 25–34 (2021).
Conmy, A., Mavor-Parker, A. N., Lynch, A., Heimersheim, S. & Garriga-Alonso, A. Towards automated circuit discovery for mechanistic interpretability. In Thirty-seventh Conference on Neural Information Processing Systems, 16318–16352 (2023).
Friedman, D., Wettig, A. & Chen, D. Learning transformer programs. In Advances in Neural Information Processing Systems, vol. 36 (2023).
Alvarez Melis, D. & Jaakkola, T. Towards robust interpretability with self-explaining neural networks. In Advances in Neural Information Processing Systems, vol. 31 (2018).
Jacovi, A. & Goldberg, Y. Towards faithfully interpretable NLP systems: how should we define and evaluate faithfulness? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 4198–4205 (ACL, 2020).
Yang, M. & Kim, B. Benchmarking attribution methods with relative feature importance. Preprint at https://arxiv.org/abs/1907.09701 (2019).
Adebayo, J., Muelly, M., Liccardi, I. & Kim, B. Debugging tests for model explanations. In Advances in Neural Information Processing Systems 33 (NIPS, 2020).
Kim, J. S., Plumb, G. & Talwalkar, A. Sanity simulations for saliency methods. In Proceedings of the 39th International Conference on Machine Learning, 11173–11200 (2022).
Zhou, Y., Booth, S., Ribeiro, M. T. & Shah, J. Do feature attribution methods correctly attribute features? In Proceedings of the AAAI Conference on Artificial Intelligence, 36, 9623–9633 (2022).
Article Google Scholar
Agarwal, C. et al. Openxai: towards a transparent evaluation of model explanations. In Advances in Neural Information Processing Systems, 35 (2022).
Ghorbani, A., Abid, A. & Zou, J. Interpretation of neural networks is fragile. In Proceedings of the AAAI Conference on Artificial Intelligence, 33, 3681–3688 (2019).
Article Google Scholar
Krishna, S. et al. The disagreement problem in explainable machine learning: a practitioner’s perspective. Preprint at https://arxiv.org/abs/2202.01602 (2022).
Zhao, Y., Shao, J. & Asmann, Y. W. Assessment and optimization of explainable machine learning models applied to transcriptomic data. Genomics Proteomics Bioinformatics 20, 899–911 (2022).
Article PubMed PubMed Central Google Scholar
Tang, Z. et al. Interpretable classification of Alzheimer’s disease pathologies with a convolutional neural network pipeline. Nat. Commun. 10, 2173 (2019).
Article PubMed PubMed Central Google Scholar
Zeiler, M. D. & Fergus, R. Visualizing and understanding convolutional networks. In (eds. Fleet, D et al.) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8689. Springer, Cham. https://doi.org/10.1007/978-3-319-10590-1_53 (2014).
Shrikumar, A. et al. Technical note on transcription factor motif discovery from importance scores (TF-MoDISco) version 0.5. 6.5. Preprint at https://arxiv.org/abs/1811.00416 (2018).
Rahman, M. A. & Rangwala, H. IDMIL: an alignment-free interpretable Deep Multiple Instance Learning (MIL) for predicting disease from whole-metagenomic data. Bioinformatics 36, i39–i47 (2020).
Article CAS PubMed PubMed Central Google Scholar
Wang, L. et al. An interpretable deep-learning architecture of capsule networks for identifying cell-type gene expression programs from single-cell RNA-sequencing data. Nat. Mach. Intell. 2, 693–703 (2020).
Article Google Scholar
Nagao, Y., Sakamoto, M., Chinen, T., Okada, Y. & Takao, D. Robust classification of cell cycle phase and biological feature extraction by image-based deep learning. Mol. Biol. Cell 31, 1346–1354 (2020).
Article CAS PubMed PubMed Central Google Scholar
Lafarge, M. W. et al. Capturing single-cell phenotypic variation via unsupervised representation learning. In International Conference on Medical Imaging with Deep Learning, 315–325 (PMLR, 2019).
Tan, J. et al. Cell-type-specific prediction of 3D chromatin organization enables high-throughput in silico genetic screening. Nat. Biotechnol. 41, 1140–1150 (2023).
Article CAS PubMed PubMed Central Google Scholar
Dalla-Torre, H. et al. The nucleotide transformer: building and evaluating robust foundation models for human genomics. Preprint at bioRxiv https://doi.org/10.1101/2023.01.11.523679 (2023).
Abnar, S. & Zuidema, W. Quantifying attention flow in transformers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 4190–4197 (ACL, 2020).
Tang, X. et al. Explainable multi-task learning for multi-modality biological data analysis. Nat. Commun. 14, 2546 (2023).
Article CAS PubMed PubMed Central Google Scholar
Washburn, J. D. et al. Evolutionarily informed deep learning methods for predicting relative transcript abundance from DNA sequence. Proc. Natl Acad. Sci. USA 116, 5542–5549 (2019).
Article CAS PubMed PubMed Central Google Scholar
Nguyen, E. et al. HyenaDNA: Long-range genomic sequence modeling at single nucleotide resolution. In Advances in Neural Information Processing Systems, 36 (2024).
Zhou, Z., Ji, Y., Li, W., Dutta, P., Davuluri, R. V. & Liu, H. DNABERT-2: efficient foundation model and benchmark for multi-species genomes. In International Conference on Learning Representations (ICLR, 2024).
Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 35, 24824–24837 (2022).
Google Scholar
Liu, K., Casper, S., Hadfield-Menell, D. & Andreas, J. Cognitive dissonance: why do language model outputs disagree with internal representations of truthfulness? In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing 4791–4797 (2023).
Srivastava, D., Aydin, B., Mazzoni, E. O. & Mahony, S. An interpretable bimodal neural network characterizes the sequence and preexisting chromatin predictors of induced transcription factor binding. Genome Biol. 22, 20 (2021).
Chen, R. J. et al. Pathomic fusion: an integrated framework for fusing histopathology and genomic features for cancer diagnosis and prognosis. IEEE Trans. Med. Imaging 41, 757–770(2020).
Liu, C., Huang, H. & Yang, P. Multi-task learning from multimodal single-cell omics with Matilda. Nucleic Acids Res. 51, e45 (2023).
Article CAS PubMed PubMed Central Google Scholar
Liang, P. P. et al. MultiViz: towards visualizing and understanding multimodal models. In Eleventh International Conference on Learning Representations (2023).
Valeri, J. A. et al. BioAutoMATED: an end-to-end automated machine learning tool for explanation and design of biological sequences. Cell Syst. 14, 525–542 (2023).
Article CAS PubMed PubMed Central Google Scholar
Ji, Y., Zhou, Z., Liu, H. & Davuluri, R. V. DNABERT: pre-trained bidirectional encoder representations from transformers model for DNA-language in genome. Bioinformatics 37, 2112–2120 (2021).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Institutes of Health Common Fund 4D Nucleome Program grant UM1HG011593 (to J.M.), National Institutes of Health Common Fund Cellular Senescence Network Program grant UH3CA268202 (to J.M.), National Institutes of Health grants R01HG007352 (to J.M.), R01HG012303 (to J.M.) and U24HG012070 (to J.M.), and National Science Foundation grants IIS1705121 (to A.T.), IIS1838017 (to A.T.), IIS2046613 (to A.T.) and IIS2112471 (to A.T.). J.M. was additionally supported by a Guggenheim Fellowship from the John Simon Guggenheim Memorial Foundation, a Google Research Collabs Award and a Single-Cell Biology Data Insights award from the Chan Zuckerberg Initiative. A.T. was additionally supported by funding from Meta, Morgan Stanley and Amazon. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of any of these funding agencies.

Author information

These authors contributed equally: Valerie Chen, Muyu Yang.

Authors and Affiliations

Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
Valerie Chen, Wenbo Cui, Joon Sik Kim & Ameet Talwalkar
Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
Muyu Yang & Jian Ma

Authors

Valerie Chen
View author publications
Search author on:PubMed Google Scholar
Muyu Yang
View author publications
Search author on:PubMed Google Scholar
Wenbo Cui
View author publications
Search author on:PubMed Google Scholar
Joon Sik Kim
View author publications
Search author on:PubMed Google Scholar
Ameet Talwalkar
View author publications
Search author on:PubMed Google Scholar
Jian Ma
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization, V.C., M.Y., A.T. and J.M.; investigation, V.C., M.Y., W.C., J.S.K., A.T. and J.M.; writing, V.C., M.Y., A.T. and J.M.; funding acquisition, A.T. and J.M.

Corresponding authors

Correspondence to Ameet Talwalkar or Jian Ma.

Ethics declarations

Competing interests

A.T. received gift research grants from Meta, Morgan Stanley, and Amazon. J.M. received gift research grant from Google Research. A.T. works part-time for Amplify Partners. The other authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Maxwell Libbrecht and Juan Caicedo for their contribution to the peer review of this work. Primary Handling Editor: Lin Tang, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, V., Yang, M., Cui, W. et al. Applying interpretable machine learning in computational biology—pitfalls, recommendations and opportunities for new developments. Nat Methods 21, 1454–1461 (2024). https://doi.org/10.1038/s41592-024-02359-7

Download citation

Received: 27 October 2022
Accepted: 24 June 2024
Published: 09 August 2024
Issue date: August 2024
DOI: https://doi.org/10.1038/s41592-024-02359-7

This article is cited by

A mechanism-informed deep neural network enables prioritization of regulators that drive cell state transitions
- Xi Xi
- Jiaqi Li
- Xuegong Zhang
Nature Communications (2025)
Empowering scientific discovery with explainable small domain-specific and large language models
- Hengjie Yu
- Yizhi Wang
- Yaochu Jin
Artificial Intelligence Review (2025)
Letter to the Editor: Navigating bias in machine learning—reevaluating feature importances through robust statistical analysis
- Yoshiyasu Takefuji
European Radiology (2025)
Embedding AI in biology

Nature Methods (2024)

Applying interpretable machine learning in computational biology—pitfalls, recommendations and opportunities for new developments

Subjects

Abstract

Access options

Similar content being viewed by others

A guide to machine learning for biologists

Learning meaningful representations of protein sequences

Navigating the pitfalls of applying machine learning in genomics

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Rights and permissions

About this article

Cite this article

This article is cited by

A mechanism-informed deep neural network enables prioritization of regulators that drive cell state transitions

Empowering scientific discovery with explainable small domain-specific and large language models

Letter to the Editor: Navigating bias in machine learning—reevaluating feature importances through robust statistical analysis

Embedding AI in biology

Search

Quick links

Subjects

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links