Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Accelerating primer design for amplicon sequencing using large language model-powered agents

Abstract

The pre-trained knowledge compressed in large language models is addressing diverse scientific challenges and catalysing the progression of autonomous laboratory systems, synergized with liquid handling robots. Here we introduce PrimeGen, an orchestrated multi-agent system powered by large language models, designed to streamline labour-intensive primer design tasks for targeted next-generation sequencing. PrimeGen uses GPT-4o as a central controller to engage with experimentalists for task planning and decomposition, coordinating various specialized agents to execute distinct subtasks. These include an interactive search agent for retrieving gene targets from databases, a primer agent for designing primer sequences across multiple scenarios, a protocol agent for generating executable robot scripts through retrieval-augmented generation and prompt engineering, and an experiment agent equipped with a vision language model for detecting and reporting anomalies. We experimentally demonstrate the effectiveness of PrimeGen across a variety of applications. PrimeGen can accommodate up to 955 amplicons, ensuring high amplification uniformity and minimizing dimer formation. Our development underscores the potential of collaborative agents, coordinated by generalist foundation models, as intelligent tools for advancing biomedical research.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of PrimeGen workflow.
Fig. 2: The workflow of the search agent and primer agent.
Fig. 3: Experiment result analysis.
Fig. 4: The workflow of the protocol agent.
Fig. 5: Overview of the liquid handling system and anomaly detection workflow.

Similar content being viewed by others

Data availability

Publicly available datasets were used in this study. Global health data were obtained from the World Health Organization (https://apps.who.int/iris/bitstream/handle/10665/341906/WHO-UCN-GTB-PCI-2021.7-eng.xlsx). The dataset comprises clinically annotated genetic variants in VCF format, and was systematically retrieved from the ClinVar database (https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/) and curated95. UniProt datasets were dynamically accessed via the UniProt RESTful API based on search criteria such as gene or protein names (query example, http://rest.uniprot.org/uniprotkb/search?query = (gene:{Gene_Names})&format=json). Genome sequences were searched and screened through the NCBI Genome database (https://ftp.ncbi.nlm.nih.gov/genomes/). Species classification information was sourced from the NCBI Taxonomy database (https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/) and curated96. Certain datasets, such as those from the Comprehensive Antibiotic Resistance Database (CARD), were obtained upon request (https://card.mcmaster.ca/download). Data from OMIM (https://www.omim.org) and COSMIC (https://cancer.sanger.ac.uk/cosmic) require independent access requests and are not publicly redistributable. The GRCh38 (hg38) reference genome was used for expanded carrier screening (ECS) panel design, with exon regions annotated based on Gencode v46. Nucleotide sequences for SARS-CoV-2 were retrieved from the NCBI Virus SARS-CoV-2 Data Hub (https://www.ncbi.nlm.nih.gov/activ). Multiple sequence alignment of 20,000 SARS-CoV-2 genomes was performed using MAFFT, with NC_045512.2 (Wuhan-Hu-1 isolate) as the reference genome. Source data supporting the findings of this study are provided with the paper97.

Owing to regulatory and ethical considerations, access to certain restricted datasets may require specific approvals. The main findings of this study can be replicated using the publicly available datasets listed above. Reviewers were provided controlled access to restricted datasets for validation purposes. For further information regarding restricted data access, readers are advised to contact the relevant data repositories directly. Source data are provided with this paper.

Code availability

PrimeGen is written in Python using a Docker container. The source code can be accessed at

https://github.com/melobio/PrimeGen under the GPLv3 license. The doi of the GitHub repository for PrimeGen is provided by the Zenodo link98.

References

  1. Merchant, A. et al. Scaling deep learning for materials discovery. Nature 624, 80–85 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Bennett, J. A. et al. Autonomous reaction Pareto-front mapping with a self-driving catalysis laboratory. Nat. Chem. Eng. 1, 240–250 (2024).

    Article  Google Scholar 

  3. Slattery, A. et al. Automated self-optimization, intensification, and scale-up of photocatalysis in flow. Science 383, eadj1817 (2024).

    Article  CAS  PubMed  Google Scholar 

  4. Bryant, J. A. Jr, Kellinger, M., Longmire, C., Miller, R. & Wright, R. C. AssemblyTron: flexible automation of DNA assembly with Opentrons OT-2 lab robots. Synth. Biol. 8, ysac032 (2023).

    Article  Google Scholar 

  5. Volk, A. A. et al. AlphaFlow: autonomous discovery and optimization of multi-step chemistry using a self-driven fluidic lab guided by reinforcement learning. Nat. Commun. 14, 1403 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Wierenga, R. P., Golas, S. M., Ho, W., Coley, C. W. & Esvelt, K. M. PyLabRobot: an open-source, hardware-agnostic interface for liquid-handling robots and accessories. Device 1, 100111 (2023).

    Article  Google Scholar 

  7. Liu, L., Huang, Y. & Wang, H. H. Fast and efficient template-mediated synthesis of genetic variants. Nat. Methods 20, 841–848 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Huang, Y. et al. High-throughput microbial culturomics using automation and machine learning. Nat. Biotechnol. 41, 1424–1433 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Dama, A. C. et al. BacterAI maps microbial metabolism without prior knowledge. Nat. Microbiol. 8, 1018–1025 (2023).

    Article  CAS  PubMed  Google Scholar 

  10. Vemprala, S. H., Bonatti, R., Bucker, A. & Kapoor, A. ChatGPT for robotics: design principles and model abilities. IEEE Access 12, 55682–55696 (2024).

    Article  Google Scholar 

  11. Achiam, J. et al. GPT-4 technical report. Preprint at https://arxiv.org/abs/2303.08774 (2023).

  12. Yao, S. et al. ReAct: synergizing reasoning and acting in language models. In Eleventh International Conference on Learning Representations (ICLR, 2023).

  13. Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 35, 24824–24837 (2022).

    Google Scholar 

  14. Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020).

    Google Scholar 

  15. Romera-Paredes, B. et al. Mathematical discoveries from program search with large language models. Nature 625, 468–475 (2024).

    Article  CAS  PubMed  Google Scholar 

  16. Yang, C. et al. Large language models as optimizers. In Twelfth International Conference on Learning Representations (ICLR, 2024).

  17. Lewis, P. et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. Adv. Neural Inf. Process. Syst. 33, 9459–9474 (2020).

    Google Scholar 

  18. Xi, Z. et al. The rise and potential of large language model based agents: a survey. Sci. China Inf. Sci. 68, 121101 (2025).

    Article  Google Scholar 

  19. Zhou, W. et al. Agents: an open-source framework for autonomous language agents. In Twelfth International Conference on Learning Representations (ICLR, 2024).

  20. Shen, Y. et al. HuggingGPT: solving AI tasks with ChatGPT and its friends in hugging face. Adv. Neural Inf. Process. Syst. 36, 38154–38180 (2023).

    Google Scholar 

  21. Qian, C. et al. Communicative agents for software development. Preprint at https://arxiv.org/abs/2307.07924 (2023).

  22. Wu, Q. et al. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversations. In First Conference on Language Modeling (2024).

  23. Ghafarollahi, A. & Buehler, M. J. SciAgents: automating scientific discovery through bioinspired multi-agent intelligent graph reasoning. Adv. Mater. 37, 2413523 (2025).

    Article  CAS  PubMed  Google Scholar 

  24. Yang, Z. et al. MM-REACT: prompting ChatGPT for multimodal reasoning and action. Preprint at https://arxiv.org/abs/2303.11381 (2023).

  25. Patil, S. G., Zhang, T., Wang, X. & Gonzalez, J. E. Gorilla: large language model connected with massive APIs. Adv. Neural Inf. Process. Syst. 37, 126544–126565 (2024).

    Google Scholar 

  26. Messeri, L. & Crockett, M. J. Artificial intelligence and illusions of understanding in scientific research. Nature 627, 49–58 (2024).

    Article  CAS  PubMed  Google Scholar 

  27. Krenn, M. et al. On scientific understanding with artificial intelligence. Nat. Rev. Phys. 4, 761–769 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  28. Boiko, D. A., MacKnight, R., Kline, B. & Gomes, G. Autonomous chemical research with large language models. Nature 624, 570–578 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Bran, A. M. et al. Augmenting large language models with chemistry tools. Nat. Mach. Intell. 6, 525–535 (2024).

    Article  Google Scholar 

  30. Darvish, K. et al. ORGANA: a robotic assistant for automated chemistry experimentation and characterization. Matter 8, 101897 (2025).

    Article  Google Scholar 

  31. Dai, T. et al. Autonomous mobile robots for exploratory synthetic chemistry. Nature 635, 890–897 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  32. Gao, S. et al. Empowering biomedical discovery with AI agents. Cell 187, 6125–6151 (2024).

    Article  CAS  PubMed  Google Scholar 

  33. Xiao, M. et al. Multiple approaches for massively parallel sequencing of SARS-CoV-2 genomes directly from clinical samples. Genome Med. 12, 57 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Kunasol, C. et al. Comparative analysis of targeted next-generation sequencing for Plasmodium falciparum drug resistance markers. Sci. Rep. 12, 5563 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Nozawa, A. et al. Comprehensive targeted next-generation sequencing in patients with slow-flow vascular malformations. J. Hum. Genet. 67, 721–728 (2022).

    Article  PubMed  Google Scholar 

  36. Rawat, A. et al. Utility of targeted next generation sequencing for inborn errors of immunity at a tertiary care centre in North India. Sci. Rep. 12, 10416 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Jan, Y.-H. et al. Comprehensive assessment of actionable genomic alterations in primary colorectal carcinoma using targeted next-generation sequencing. Br. J. Cancer 127, 1304–1311 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Xie, N. G. et al. Designing highly multiplex PCR primer sets with simulated annealing design using dimer likelihood estimation (SADDLE). Nat. Commun. 13, 1881 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Shinn, N., Cassano, F., Gopinath, A., Narasimhan, K. & Yao, S. Reflexion: language agents with verbal reinforcement learning. Adv. Neural Inf. Process. Syst. 36, 8634–8652 (2023).

    Google Scholar 

  40. Madaan, A. et al. Self-refine: iterative refinement with self-feedback. Adv. Neural Inf. Process. Syst. 36, 46534–46594 (2023).

    Google Scholar 

  41. Szymanski, N. J. et al. An autonomous laboratory for the accelerated synthesis of novel materials. Nature 624, 86–91 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Amberger, J. S., Bocchini, C. A., Schiettecatte, F., Scott, A. F. & Hamosh, A. OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 43, D789–D798 (2015).

    Article  PubMed  Google Scholar 

  43. Sondka, Z. et al. COSMIC: a curated database of somatic variants and clinical data for cancer. Nucleic Acids Res. 52, D1210–D1217 (2024).

    Article  CAS  PubMed  Google Scholar 

  44. Landrum, M. J. et al. ClinVar: improvements to accessing data. Nucleic Acids Res. 48, D835–D844 (2020).

    Article  CAS  PubMed  Google Scholar 

  45. UniProt Consortium UniProt: a hub for protein information. Nucleic Acids Res. 43, D204–D212 (2015).

    Article  Google Scholar 

  46. Federhen, S. The NCBI taxonomy database. Nucleic Acids Res. 40, D136–D143 (2012).

    Article  CAS  PubMed  Google Scholar 

  47. The WHO Global Tuberculosis Report 2022 (WHO, 2022); https://www.who.int/teams/global-tuberculosis-programme/tb-reports/global-tuberculosis-report-2022

  48. McArthur, A. G. et al. The comprehensive antibiotic resistance database. Antimicrob. Agents Chemother. 57, 3348–3357 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinform. 10, 1–9 (2009).

    Article  Google Scholar 

  50. Wang, M. X. et al. Olivar: towards automated variant aware primer design for multiplex tiled amplicon sequencing of pathogens. Nat. Commun. 15, 6306 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Xia, H. et al. MultiPrime: a reliable and efficient tool for targeted next-generation sequencing. iMeta 2, e143 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Wang, K. et al. MFEprimer-3.0: quality control for PCR primers. Nucleic Acids Res. 47, W610–W613 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Dreier, M., Berthoud, H., Shani, N., Wechsler, D. & Junier, P. SpeciesPrimer: a bioinformatics pipeline dedicated to the design of qPCR primers for the quantification of bacterial species. PeerJ 8, e8544 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  54. Yang, L. et al. A tool to automatically design multiplex PCR primer pairs for specific targets using diverse templates. Sci. Rep. 13, 16451 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Yuan, J. et al. The web-based multiplex PCR primer design software Ultiplex and the associated experimental workflow: up to 100-plex multiplicity. BMC Genom. 22, 835 (2021).

    Article  CAS  Google Scholar 

  56. Ghezzi, H. et al. PUPpy: a primer design pipeline for substrain-level microbial detection and absolute quantification. mSphere 9, e00360–00324 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  57. dnasoftware (dnasoftware); https://www.dnasoftware.com/ (2025).

  58. SantaLucia, J. Jr & Hicks, D. The thermodynamics of DNA structural motifs. Annu. Rev. Biophys. Biomol. Struct. 33, 415–440 (2004).

    Article  CAS  PubMed  Google Scholar 

  59. Untergasser, A. et al. Primer3—new capabilities and interfaces. Nucleic Acids Res. 40, e115 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Sinai, S. et al. AdaLead: a simple and robust adaptive greedy search algorithm for sequence design. Preprint at https://arxiv.org/abs/2010.02141 (2020).

  61. Goldberg, D. E. Genetic Algorithms in Search, Optimization and Machine Learning (Addison Wesley Publishing Company, 1989).

  62. hCoV-2019/nCoV-2019 Version 3 Amplicon Set (ARTIC, 2020); https://artic.network/resources/ncov/ncov-amplicon-v3.pdf

  63. ARTIC v5.3.2 (ARTIC); https://github.com/quick-lab/SARS-CoV-2/blob/main/400/v5.3.2_400/pooling

  64. Quick, J. et al. Multiplex PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples. Nat. Protoc. 12, 1261–1276 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Edwards, J. G. et al. Expanded carrier screening in reproductive medicine—points to consider: a joint statement of the American College of Medical Genetics and Genomics, American College of Obstetricians and Gynecologists, National Society of Genetic Counselors, Perinatal Quality Foundation, and Society for Maternal-Fetal Medicine. Obstet. Gynecol. 125, 653–662 (2015).

    Article  PubMed  Google Scholar 

  66. Goldberg, J. D., Pierson, S. & Johansen Taber, K. Expanded carrier screening: what conditions should we screen for? Prenat. Diagn. 43, 496–505 (2023).

    Article  PubMed  Google Scholar 

  67. Cabibbe, A. M. et al. Application of targeted next-generation sequencing assay on a portable sequencing platform for culture-free detection of drug-resistant tuberculosis from clinical samples. J. Clin. Microbiol. 58, 10–1128 (2020).

    Article  Google Scholar 

  68. Dookie, N., Khan, A., Padayatchi, N. & Naidoo, K. Application of next generation sequencing for diagnosis and clinical management of drug-resistant tuberculosis: updates on recent developments in the field. Front. Microbiol. 13, 775030 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  69. Catalogue of Mutations in Mycobacterium tuberculosis Complex and Their Association with Drug Resistance (WHO, 2021); https://www.who.int/publications/i/item/9789240028173

  70. Butler, W. R. & Guthertz, L. S. Mycolic acid analysis by high-performance liquid chromatography for identification of Mycobacterium species. Clin. Microbiol. Rev. 14, 704–726 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Ni, G. et al. Novel multiplexed amplicon-based sequencing to quantify SARS-CoV-2 RNA from wastewater. Environ. Sci. Technol. Lett. 8, 683–690 (2021).

    Article  CAS  PubMed  Google Scholar 

  72. Vanella, R., Kovacevic, G., Doffini, V., de Santaella, J. F. & Nash, M. A. High-throughput screening, next generation sequencing and machine learning: advanced methods in enzyme engineering. Chem. Commun. 58, 2455–2467 (2022).

    Article  CAS  Google Scholar 

  73. Nakatsu, T. et al. Structural basis for the spectral difference in luciferase bioluminescence. Nature 440, 372–376 (2006).

    Article  CAS  PubMed  Google Scholar 

  74. Hashimoto, H. et al. Crystal structure of DNA polymerase from hyperthermophilic archaeon Pyrococcus kodakaraensis KOD1. J. Mol. Biol. 306, 469–477 (2001).

    Article  CAS  PubMed  Google Scholar 

  75. Lunde, B. M., Magler, I. & Meinhart, A. Crystal structures of the Cid1 poly (U) polymerase reveal the mechanism for UTP selectivity. Nucleic Acids Res. 40, 9815–9824 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Lu, X. et al. Enzymatic DNA synthesis by engineering terminal deoxynucleotidyl transferase. ACS Catal. 12, 2988–2997 (2022).

    Article  CAS  Google Scholar 

  77. MGI AlphaTool (MGI); https://www.mgi-tech.com/647 (2024).

  78. Khot, T. et al. Decomposed prompting: a modular approach for solving complex tasks. In Eleventh International Conference on Learning Representations (ICLR, 2023).

  79. Li, C. et al. LLaVA-Med: training a large language-and-vision assistant for biomedicine in one day. Adv. Neural Inf. Process. Syst. 36, 28541–28564 (2023).

    Google Scholar 

  80. Jetson Nano (NVIDIA); https://developer.nvidia.com/embedded/jetson-nano (2019).

  81. Taymans, W., Baker, S., Wingo, A., Bultje, R. S. & Kost, S. Gstreamer application development manual (1.2.3). https://gstreamer.freedesktop.org/ (2013).

  82. Hong, Y. et al. 3D-LLM: injecting the 3D world into large language models. Adv. Neural Inf. Process. Syst. 36, 20482–20494 (2023).

    Google Scholar 

  83. Wang, P. et al. Qwen2-vl: enhancing vision-language model’s perception of the world at any resolution. Preprint at https://arxiv.org/abs/2409.12191 (2024).

  84. Hu, E. J. et al. LoRA: low-rank adaptation of large language models. In Tenth International Conference on Learning Representations 1, 3 (ICLR, 2022).

  85. Yao, S. et al. Tree of thoughts: deliberate problem solving with large language models. Adv. Neural Inf. Process. Syst. 36, 11809–11822 (2023).

    Google Scholar 

  86. Zelikman, E. et al. Quiet-STaR: language models can teach themselves to think before speaking. Preprint at https://arxiv.org/abs/2403.09629 (2024).

  87. Liu, Z. et al. Inference-time scaling for generalist reward modeling. Preprint at https://arxiv.org/abs/2504.02495 (2025).

  88. Frankish, A. et al. GENCODE: reference annotation for the human and mouse genomes in 2023. Nucleic Acids Res. 51, D942–D949 (2023).

    Article  CAS  PubMed  Google Scholar 

  89. Wright, C. F., FitzPatrick, D. R., Ware, J. S., Rehm, H. L. & Firth, H. V. Importance of adopting standardized MANE transcripts in clinical reporting. Genet. Med. 25, 100331 (2023).

    Article  CAS  PubMed  Google Scholar 

  90. Morales, J. et al. A joint NCBI and EMBL-EBI transcript set for clinical genomics and research. Nature 604, 310–315 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  91. SARS-CoV-2 Variants Overview (NCBI Virus, 2004–2024); https://www.ncbi.nlm.nih.gov/activ

  92. Katoh, K., Misawa, K., Kuma, K. I. & Miyata, T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059–3066 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  93. Abdin, M. et al. Phi-3 technical report: a highly capable language model locally on your phone. Preprint at https://arxiv.org/abs/2404.14219 (2024).

  94. Yao, Y. et al. MiniCPM-V: a GPT-4V level MLLM on your phone. Preprint at https://arxiv.org/abs/2408.01800 (2024).

  95. Hui, T. Gene data from Clinvar. figshare https://doi.org/10.6084/m9.figshare.28876808.v4 (2025).

  96. Hui, T. Species identification data from NCBI. figshare https://doi.org/10.6084/m9.figshare.28877087.v1 (2025).

  97. Hui, T. PrimeGen Figs. 2–4 Source Data. figshare https://doi.org/10.6084/m9.figshare.28876844.v1 (2025).

  98. melobio. melobio/PrimeGen: V1.0.1 (V1.0.1). Zenodo https://doi.org/10.5281/zenodo.15279353 (2025).

Download references

Acknowledgements

This research is supported by the Ministry of Science and Technology of the People’s Republic of China’s programme titled ‘National Key Research and Development Program of China’ (2022YFF1202200).

Author information

Authors and Affiliations

Authors

Contributions

M.Y. conceived the problem and designed all studies. Y.W. assisted and oversaw the computational pipeline. Y. Hou and H.T. developed the ‘search’ and ‘experiment’ agent. H.T., Y. Hou, W.T. and Y.W. developed the ‘protocol’ agent. L.Y., W.T., H.Z. and X.L. planned and executed the library construction experiments. Y.W. and H.T. developed the ‘primer’ agent. S. Li, Y. Hou and S.C. developed the LLM ‘controller’. Y. Huang set up the cameras for video collection in VLM supervision. L.K. and Y. Hou implemented the anomaly detection module in the ‘experiment’ agent. Q.H. assisted in developing the ‘search’ and ‘primer’ agent. J.W., H.Y., D.Y. and F.M. provided strategic guidance. N.H. provided suggestions for biomedical applications. S. Lin provided suggestions for automation systems. Y.Z. provided suggestions for PCR experiment design. M.Y. and Y.W. wrote the paper, and others made modifications.

Corresponding authors

Correspondence to Nattiya Hirankarn or Meng Yang.

Ethics declarations

Competing interests

J.W., D.Y. and F.M. declare stock holdings in MGI. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Biomedical Engineering thanks Wei Chen and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 The dialog example for pathogen detection.

Profile picture: Pink: controller, Blue: The Search Agent, White: The Primer Agent, the user is on the right.

Extended Data Fig. 2 The dialog example for protein mutation analysis.

Profile picture: Pink: controller, Blue: The Search Agent, White: The Primer Agent, the user is on the right.

Extended Data Fig. 3 The dialog example for genetic disorder.

Profile picture: Pink: controller, Blue: The Search Agent, White: The Primer Agent, the user is on the right.

Extended Data Fig. 4 The dialog example for SNP with reference.

Profile picture: Pink: controller, Blue: The Search Agent, White: The Primer Agent, the user is on the right.

Extended Data Fig. 5 The dialog example for cancer drug target.

Profile picture: Pink: controller, Blue: The Search Agent, White: The Primer Agent, the user is on the right.

Extended Data Fig. 6 The dialog example for Whole genome detection.

Profile picture: Pink: controller, Blue: The Search Agent, White: The Primer Agent, the user is on the right.

Extended Data Fig. 7 The dialog example for redesign primers.

Profile picture: Pink: controller, Blue: The Search Agent, White: The Primer Agent, the user is on the right.

Supplementary information

Supplementary Information

Supplementary Notes, Figs. 1–5, Tables 1–8 and references.

Reporting Summary

Supplementary Data 1

The examples of typical queries and the corresponding retrieved links for the five primer design scenarios.

Supplementary Data 2

The primer pool files and the sequencing analysis data for the four benchmarking approaches for the SARS-CoV-2 task.

Supplementary Data 3

The primer pool file, the 35 associated genes and the sequencing results data for the ECS panel design task.

Supplementary Data 4

The primer pool files for the task of detecting drug resistance mutations in MTB.

Supplementary Data 5

The primer pool files for Gluc, KOD, Cid1 and TdT in the protein mutation detection task.

Supplementary Data 6

The expert-provided seed data and LLM prompts for generating training data for the two-stage fine-tuning of the VLM for detecting abnormal events.

Source data

Source Data Fig. 2

The loss values for each iteration of the panel optimization process across the four benchmarking approaches (LLM, GA, AdaLead and Greedy) in Fig. 2e(i) and Fig. 2e(ii).

Source Data Fig. 3

Statistical source data in Fig. 3.

Source Data Fig. 3

Uncropped gel images of PCR products from SARS-CoV-2 panels for ARTIC, PrimalScheme, PrimeGen and PrimeGen-Primer3; uncropped gel image of PCR products from PrimeGen ECS panel; uncropped gel images of PCR products from PrimeGen MTB panels for round 1 and round 2 design; uncropped gel image of PCR products from PrimeGen mixing (Gluc KODm Cid1 TdT) plasmid panel for round 1 design; uncropped gel image of PCR products from PrimeGen TdT plasmid panel for round 2 design.

Source Data Fig. 4

Benchmarking of the LLM base model for three scenarios: target sequence retrieval, LLM panel optimization and protocol code modification (Fig. 4e).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Y., Hou, Y., Yang, L. et al. Accelerating primer design for amplicon sequencing using large language model-powered agents. Nat. Biomed. Eng (2025). https://doi.org/10.1038/s41551-025-01455-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41551-025-01455-z

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing