Enhancing functional gene set analysis with large language models

doi:10.1038/s41592-024-02526-w

Research Briefing
Published: 28 November 2024

Enhancing functional gene set analysis with large language models

Nature Methods volume 22, pages 22–23 (2025)Cite this article

4039 Accesses
3 Citations
2 Altmetric
Metrics details

Subjects

Large language models (LLMs) demonstrate potential as assistants in functional genomics, offering a new avenue for gene set analysis. In our evaluation of five LLMs, GPT-4 was the top-performing model and generated common functions for gene sets with high specificity, reliable self-assessed confidence and supporting analysis, complementing traditional functional enrichment.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Overview of LLM-based gene set analysis and evaluation framework.**

References

Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA. 102, 15545–15550 (2005). This paper presents the gene set enrichment analysis tool, providing one example of functional enrichment methods.
Article CAS PubMed PubMed Central Google Scholar
Joachimiak, M. P., Harry Caufield, J., Harris, N. L., Kim, H. & Mungall, C. J. Gene set summarization using large language models. Preprint at https://arxiv.org/abs/2305.13338 (2023). A preprint that introduces the use of LLMs to retrieve relevant GO terms to annotate the gene set.
Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000). This article presents the GO consortium, a project to annotate genes using a common vocabulary across different organisms.
Article CAS PubMed PubMed Central Google Scholar
Huang, L. et al. A survey on hallucination in large language models: principles, taxonomy, challenges, and open questions. Preprint at https://arxiv.org/abs/2311.05232 (2023). This preprint reviews ‘hallucinations’ seen in emerging LLMs and guides the approach to detect and mitigate this phenomenon.
Wang, Z. et al. GeneAgent: self-verification language agent for gene set knowledge discovery using domain databases. Preprint at https://arxiv.org/abs/2405.16205 (2024). This preprint extends our study by building a pipeline that reduces ‘hallucinations’ and improves reliability by autonomously interacting with biological databases.

Download references

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This is a summary of: Hu, M. et al. Evaluation of large language models for discovery of gene set function. Nat. Methods https://doi.org/10.1038/s41592-024-02525-x (2024).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Enhancing functional gene set analysis with large language models. Nat Methods 22, 22–23 (2025). https://doi.org/10.1038/s41592-024-02526-w

Download citation

Published: 28 November 2024
Version of record: 28 November 2024
Issue date: January 2025
DOI: https://doi.org/10.1038/s41592-024-02526-w

Enhancing functional gene set analysis with large language models

Subjects

Access options

References

Additional information

Rights and permissions

About this article

Cite this article

Evaluation of large language models for discovery of gene set function

Search

Quick links

Subjects

Access options

References

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links