Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Research Briefing
  • Published:

Enhancing functional gene set analysis with large language models

Large language models (LLMs) demonstrate potential as assistants in functional genomics, offering a new avenue for gene set analysis. In our evaluation of five LLMs, GPT-4 was the top-performing model and generated common functions for gene sets with high specificity, reliable self-assessed confidence and supporting analysis, complementing traditional functional enrichment.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of LLM-based gene set analysis and evaluation framework.

References

  1. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA. 102, 15545–15550 (2005). This paper presents the gene set enrichment analysis tool, providing one example of functional enrichment methods.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Joachimiak, M. P., Harry Caufield, J., Harris, N. L., Kim, H. & Mungall, C. J. Gene set summarization using large language models. Preprint at https://arxiv.org/abs/2305.13338 (2023). A preprint that introduces the use of LLMs to retrieve relevant GO terms to annotate the gene set.

  3. Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000). This article presents the GO consortium, a project to annotate genes using a common vocabulary across different organisms.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Huang, L. et al. A survey on hallucination in large language models: principles, taxonomy, challenges, and open questions. Preprint at https://arxiv.org/abs/2311.05232 (2023). This preprint reviews ‘hallucinations’ seen in emerging LLMs and guides the approach to detect and mitigate this phenomenon.

  5. Wang, Z. et al. GeneAgent: self-verification language agent for gene set knowledge discovery using domain databases. Preprint at https://arxiv.org/abs/2405.16205 (2024). This preprint extends our study by building a pipeline that reduces ‘hallucinations’ and improves reliability by autonomously interacting with biological databases.

Download references

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This is a summary of: Hu, M. et al. Evaluation of large language models for discovery of gene set function. Nat. Methods https://doi.org/10.1038/s41592-024-02525-x (2024).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Enhancing functional gene set analysis with large language models. Nat Methods 22, 22–23 (2025). https://doi.org/10.1038/s41592-024-02526-w

Download citation

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41592-024-02526-w

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research